Analytic model execution engine with instrumentation for granular performance analysis for metrics and diagnostics for troubleshooting

ABSTRACT

At an interface an analytic model for processing data is received. The analytic model is inspected to determine a language, an action, an input type, and an output type. A virtualized execution environment is generated for an analytic engine that includes executable code to implement the analytic model for processing an input data stream.

CROSS REFERENCE TO OTHER APPLICATIONS

This application claims priority to U.S. Provisional Patent ApplicationNo. 62/542,218 entitled MODEL EXECUTION ENGINE WITH INSTRUMENTATION FORGRANULAR PERFORMANCE ANALYSIS FOR METRICS AND DIAGNOSTICS FORTROUBLESHOOTING filed Aug. 7, 2017 which is incorporated herein byreference for all purposes.

BACKGROUND OF THE INVENTION

The use of data and analytics is becoming increasingly important fortechnical enterprises to widen competitive advantages in terms ofscientific research and development, engineering efficiencies, andperformance improvements. Efficiently implementing and leveraging suchdata and analytics is still a technical challenge for companies.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the invention are disclosed in the followingdetailed description and the accompanying drawings.

FIG. 1 is a functional diagram illustrating a programmed computer/serversystem for an analytic model engine in accordance with some embodiments.

FIG. 2A is a block diagram illustrating an embodiment of an analyticmodel engine system.

FIG. 2B is a block diagram illustrating an embodiment of an analyticmodel engine.

FIG. 2C is a block diagram illustrating an embodiment of an analyticmodel binding.

FIG. 2D is a block diagram illustrating an embodiment of an analyticmodel engine infrastructure representation.

FIG. 2E is a block diagram illustrating an embodiment of a fleetcontroller component.

FIG. 2F is a block diagram illustrating an embodiment of a system for ananalytic model ecosystem.

FIG. 3 is a block diagram illustrating an embodiment of a system for ananalytic engine pipeline.

FIG. 4A is a block diagram illustrating an embodiment of a system for ananalytic engine ecosystem.

FIG. 4B is an example system component architecture of the FastScoreplatform.

FIG. 5A is an example of stream-related requests involving a series ofroundtrips to a database through a Connect microservice.

FIG. 5B illustrates an example wherein a facade manages asynchronousnotifications from running engines using web sockets.

FIG. 5C illustrates collecting outputs of an instance of an engine andposting them to the message queue.

FIG. 6 is a flow chart illustrating an embodiment of a process for ananalytic model execution.

FIG. 7 is a flow chart illustrating an embodiment of a process forinspecting an analytic model (226).

FIG. 8 is a flow chart illustrating an embodiment of a process forgenerating a virtualized execution environment (222) for an analyticmodel.

FIG. 9 is a flow chart illustrating an embodiment of a process foradding a model sensor and/or a stream sensor.

FIG. 10 is a flow chart illustrating an embodiment of a process for adynamically configurable microservice model for data analysis.

FIG. 11 is a flow chart illustrating an embodiment of a process fordynamic sensors (272).

FIG. 12 is a flow chart illustrating an embodiment of a process fordynamic configuration of a VEE (222).

FIG. 13 is a flow chart illustrating an embodiment of a process fordeployment and management of model execution engines.

FIG. 14 is a flow chart illustrating an embodiment of a process forredeployment of model execution engines.

DETAILED DESCRIPTION

The invention can be implemented in numerous ways, including as aprocess; an apparatus; a system; a composition of matter; a computerprogram product embodied on a computer readable storage medium; and/or aprocessor, such as a processor configured to execute instructions storedon and/or provided by a memory coupled to the processor. In thisspecification, these implementations, or any other form that theinvention may take, may be referred to as techniques. In general, theorder of the steps of disclosed processes may be altered within thescope of the invention. Unless stated otherwise, a component such as aprocessor or a memory described as being configured to perform a taskmay be implemented as a general component that is temporarily configuredto perform the task at a given time or a specific component that ismanufactured to perform the task. As used herein, the term ‘processor’refers to one or more devices, circuits, and/or processing coresconfigured to process data, such as computer program instructions.

A detailed description of one or more embodiments of the invention isprovided below along with accompanying figures that illustrate theprinciples of the invention. The invention is described in connectionwith such embodiments, but the invention is not limited to anyembodiment. The scope of the invention is limited only by the claims andthe invention encompasses numerous alternatives, modifications andequivalents. Numerous specific details are set forth in the followingdescription in order to provide a thorough understanding of theinvention. These details are provided for the purpose of example and theinvention may be practiced according to the claims without some or allof these specific details. For the purpose of clarity, technicalmaterial that is known in the technical fields related to the inventionhas not been described in detail so that the invention is notunnecessarily obscured.

An analytic model execution engine with instrumentation for granularperformance analysis for metrics and diagnostics for troubleshooting isdisclosed. For scientific research, engineering, data mining, appliedmathematical research, and/or analytics, once an analytical and/orcomputational model is developed for one platform, it is hard to rewriteand port into other platforms. Furthermore, it would be useful to beable to take advantage of different data streams to input into the modeland to monitor performance of the model on a computer system. Thedisclosed engine addresses these issues as well as various othertechnical challenges for efficiently implementing and leveraging suchdata and analytics for enterprises as further described below.

Abstraction of an analytic model from its operational executionenvironment is disclosed. A virtualized execution environment (VEE) isused to abstract the analytic model. In one embodiment, operatingsystem-level virtualization such as a container is used for the VEE andrepresents an example used throughout this specification, but withoutlimitation other virtualization levels may also be used for abstraction,including: a hardware-level virtualization such as a virtual machine(VM), application-level virtualization, workspace-level virtualization,and/or service virtualization. Container environments used may includeDocker and/or LxC.

The designer of an analytic model, referred to herein as a “data scienceuser”, uses one of a number of programming languages/tools including,for example, C, Python, Java, R, S, SAS (“Statistical Analysis System”),PFA (“Portable Format for Analytics”), H2O, PMML (“Predictive ModelMarkup Language”), SPSS, and MATLAB to articulate their analytic modelthat may use libraries/packages such as NumPy for scientific and numericcomputing, BLAST for bioinformatics, and/or TensorFlow for machinelearning. Given its programming language, the data science userincorporates design rules into the analytic model to permit abstractionof their model in a disclosed execution framework. The design rules alsoinclude specifying a schema for each input and output to the model.

The analytic model consumes data, typically in the form of a stream. Theprovider of analytic data, referred to herein as a “devops user”, usesone or more data sources including, for example, Apache Spark, Hadoop,Amazon Redshift, Azure SQL Data Warehouse, Microsoft SQL Server, and/orTeradata. The devops user also uses one or more example infrastructuresystems including: on-premises hardware such as in-office computingand/or proprietary datacenter computing; and off-premises hardware suchas cloud infrastructure including AWS (“Amazon Web Services), MicrosoftAzure, IBM BlueMix, and/or GCP (“Google Cloud Platform”). The devopsuser provides an I/O descriptor for each stream to permit abstraction ofthe I/O stream in the disclosed execution framework. Without limitationthe data science user may be the same user as the devops user.

The analytic model abstraction and I/O descriptor abstraction are usedin the design of a standardized container referred to herein as an“engine” to permit analytic models to be deployed/operationalized withtheir associated streams. In one embodiment, a containerized designapproach is used for the engine container and its associated supportcontainers such as a model connector, model manager, and dashboard witheach container providing a web service using an API, for example aRESTful API, to provide independently executable microservices. Theapproach provides a clean abstraction to the analytic design process anda clean abstraction to the data engineering and feeds. The containerabstraction itself shares the advantages of containerized environmentssuch as the Docker ecosystem, scaling, cloud ecosystems, and flexibilityusing RESTful APIs.

These abstractions divide and conquer analytics organizations to providea novel type of user, referred to herein as an “analytic ops” specialistand/or user, with the ability to deploy/operationalize an analytic modelindependent of the language of articulation, the data/streams on whichit executes over, and the systems on which it runs. Data science usersand quants focus on algorithm and model design with systems to exploreand create algorithms until they have a model conforming to design rulesfor a model abstraction. Devops users such as dataops, data engineering,and/or IT specialization focus on establishing and maintaining feeds,operational data, and/or historical data streams for a streamabstraction. Devops users also build and/or maintain the on-premises andoff-premises/cloud infrastructure for a container abstraction. Thus, thedisclosed techniques allow the analytic ops user to be free to focus ontuning and deployment of any analytic model with true languageneutrality under any infrastructure and with any data stream with truedata neutrality, without requiring a deep understanding of data science,production data, and/or infrastructure.

This container, model, and stream abstraction approach addressespredictive analytics deployment challenges:

-   -   1. IT teams find it challenging to manually recode analytic        models;    -   2. Complex, for example machine learning, analytic models are        hard to deploy;    -   3. Updating production analytic models is too slow of a process;    -   4. It is challenging to support many analytic model languages,        for example SAS, R, and Python;    -   5. Data Science and IT teams find it challenging to work        together as they belong to different disciplines with different        backgrounds, strengths, weaknesses, and/or perspectives; and    -   6. It is hard to scale scoring across enterprise data.

A dynamically configurable microservice model for data analysis usingsensors is also disclosed. Data analytics often uses high performanceprofiling/tuning to provide efficient processing of data. Sensors are aprogramming object used to provide profiling for the analytic modeland/or streams, and may be associated with metrics/tools for monitoring,testing, statistically analyzing and/or debugging. A sensor may include:a code point related to the engine structure such as a model runner,input port, or output port; a sampling frequency; and/or a reportingfrequency. Using an API such as a RESTful API a sensor may be added atrun-time, dynamically configured at run-time, and/or removed atrun-time. Dynamic configuration of a sensor includes reconfiguration asensor parameter and/or threshold.

The engine and any container and/or VEE may be dynamically configured atrun-time using an interface such as a RESTful API. Dynamic configurationof an engine includes changing a stream, an analytic model, an includedlibrary, and/or a cloud execution environment. The RESTful API may beused directly, via an SDK, via a CLI (“command line interface”), and/orvia a GUI (“graphical user interface”) such as a dashboard.

Applications for a dynamically configurable microservice analytic modelinclude, for example, being able to run-time compare different streamsources, different streams, different languages for a given analyticmodel, and different analytic models in a champion/challenger style.Applications for dynamically configurable sensors include run-timedebugging and/or run-time profiling until the engine is optimized,followed by run-time optimization by dynamically removing sensors forincreased performance.

A deployment and management platform for model execution enginecontainers is also disclosed. As described above, many use cases fordata science include a plurality of analytic model environments. Two ormore model environments may be modularized with an engine for each modelenvironment. The disclosed platform includes platform containers, suchas: a model manager container for storing each model/engine; a connectcontainer to provide discovery services for each model/engine; and afleet controller to provide deployment services for each model/engine.

One application for a deployment and management platform includesdeploying a “pipeline”, or a series of two or more engines whereinstreams from the output of one engine are coupled to the input ofanother engine. As each VEE/container is self-contained, a pipeline mayemploy a cloud complex analytic workflow that is cloud portable,multi-cloud, inter-cloud, hybrid cloud, system portable, and/or languageneutral. For example, if it is determined using sensors after a firstdeployment that a third engine within a pipeline of five engines placedin a Microsoft Azure cloud infrastructure is lagging in performance, thepipeline's third engine may be dynamically moved at run-time to a GoogleGCP cloud infrastructure such that the remaining engines remain inMicrosoft Azure.

Another application for the platform is dynamic scaling of an enginebased on a concurrency model and feedback from a sensor and/or user,wherein the platform may dynamically at run-time spin up/instantiateadditional containers for the analytic engine to be executed inparallel, and dynamically couple the appropriate I/O streams atrun-time, whether a stream is training data or live data.

FIG. 1 is a functional diagram illustrating a programmed computer/serversystem for an analytic model engine in accordance with some embodiments.As shown, FIG. 1 provides a functional diagram of a general purposecomputer system programmed to provide an analytic model engine inaccordance with some embodiments. As will be apparent, other computersystem architectures and configurations can be used for an analyticmodel engine.

Computer system 100, which includes various subsystems as describedbelow, includes at least one microprocessor subsystem, also referred toas a processor or a central processing unit (“CPU”) 102. For example,processor 102 can be implemented by a single-chip processor or bymultiple cores and/or processors. In some embodiments, processor 102 isa general purpose digital processor that controls the operation of thecomputer system 100. Using instructions retrieved from memory 110, theprocessor 102 controls the reception and manipulation of input data, andthe output and display of data on output devices, for example displayand graphics processing unit (GPU) 118.

Processor 102 is coupled bi-directionally with memory 110, which caninclude a first primary storage, typically a random-access memory(“RAM”), and a second primary storage area, typically a read-only memory(“ROM”). As is well known in the art, primary storage can be used as ageneral storage area and as scratch-pad memory, and can also be used tostore input data and processed data. Primary storage can also storeprogramming instructions and data, in the form of data objects and textobjects, in addition to other data and instructions for processesoperating on processor 102. Also as well known in the art, primarystorage typically includes basic operating instructions, program code,data and objects used by the processor 102 to perform its functions, forexample programmed instructions. For example, primary storage devices110 can include any suitable computer-readable storage media, describedbelow, depending on whether, for example, data access needs to bebi-directional or uni-directional. For example, processor 102 can alsodirectly and very rapidly retrieve and store frequently needed data in acache memory, not shown. The processor 102 may also include acoprocessor (not shown) as a supplemental processing component to aidthe processor and/or memory 110.

A removable mass storage device 112 provides additional data storagecapacity for the computer system 100, and is coupled eitherbi-directionally (read/write) or uni-directionally (read only) toprocessor 102. For example, storage 112 can also includecomputer-readable media such as flash memory, portable mass storagedevices, holographic storage devices, magnetic devices, magneto-opticaldevices, optical devices, and other storage devices. A fixed massstorage 120 can also, for example, provide additional data storagecapacity. One example of mass storage 120 is an eMMC or microSD device.In one embodiment, mass storage 120 is a solid-state drive connected bya bus 114. Mass storage 112, 120 generally store additional programminginstructions, data, and the like that typically are not in active use bythe processor 102. It will be appreciated that the information retainedwithin mass storage 112, 120 can be incorporated, if needed, in standardfashion as part of primary storage 110, for example RAM, as virtualmemory.

In addition to providing processor 102 access to storage subsystems, bus114 can be used to provide access to other subsystems and devices aswell. As shown, these can include a display monitor 118, a communicationinterface 116, a touch (or physical) keyboard 104, and one or moreauxiliary input/output devices 106 including an audio interface, a soundcard, microphone, audio port, audio recording device, audio card,speakers, a touch (or pointing) device, and/or other subsystems asneeded. Besides a touch screen and/or capacitive touch interface, theauxiliary device 106 can be a mouse, stylus, track ball, or tablet, andis useful for interacting with a graphical user interface.

The communication interface 116 allows processor 102 to be coupled toanother computer, computer network, or telecommunications network usinga network connection as shown. For example, through the communicationinterface 116, the processor 102 can receive information, for exampledata objects or program instructions, from another network, or outputinformation to another network in the course of performingmethod/process steps. Information, often represented as a sequence ofinstructions to be executed on a processor, can be received from andoutputted to another network. An interface card or similar device andappropriate software implemented by, for example executed/performed on,processor 102 can be used to connect the computer system 100 to anexternal network and transfer data according to standard protocols. Forexample, various process embodiments disclosed herein can be executed onprocessor 102, or can be performed across a network such as theInternet, intranet networks, or local area networks, in conjunction witha remote processor that shares a portion of the processing. Throughoutthis specification “network” refers to any interconnection betweencomputer components including the Internet, Bluetooth, WiFi, 3G, 4G,4GLTE, GSM, Ethernet, TCP/IP, intranet, local-area network (“LAN”),home-area network (“HAN”), serial connection, parallel connection,wide-area network (“WAN”), Fibre Channel, PCI/PCI-X, AGP, VLbus, PCIExpress, Expresscard, Infiniband, ACCESS.bus, Wireless LAN, HomePNA,Optical Fibre, G.hn, infrared network, satellite network, microwavenetwork, cellular network, virtual private network (“VPN”), UniversalSerial Bus (“USB”), FireWire, Serial ATA, 1-Wire, UNI/O, or any form ofconnecting homogenous, heterogeneous systems and/or groups of systemstogether. Additional mass storage devices, not shown, can also beconnected to processor 102 through communication interface 116.

An auxiliary I/O device interface, not shown, can be used in conjunctionwith computer system 100. The auxiliary I/O device interface can includegeneral and customized interfaces that allow the processor 102 to sendand, more typically, receive data from other devices such asmicrophones, touch-sensitive displays, transducer card readers, tapereaders, voice or handwriting recognizers, biometrics readers, cameras,portable mass storage devices, and other computers.

In addition, various embodiments disclosed herein further relate tocomputer storage products with a computer readable medium that includesprogram code for performing various computer-implemented operations. Thecomputer-readable medium is any data storage device that can store datawhich can thereafter be read by a computer system. Examples ofcomputer-readable media include, but are not limited to, all the mediamentioned above: flash media such as NAND flash, eMMC, SD, compactflash; magnetic media such as hard disks, floppy disks, and magnetictape; optical media such as CD-ROM disks; magneto-optical media such asoptical disks; and specially configured hardware devices such asapplication-specific integrated circuits (“ASIC”s), programmable logicdevices (“PLD”s), and ROM and RAM devices. Examples of program codeinclude both machine code, as produced, for example, by a compiler, orfiles containing higher level code, for example a script, that can beexecuted using an interpreter.

The computer/server system shown in FIG. 1 is but an example of acomputer system suitable for use with the various embodiments disclosedherein. Other computer systems suitable for such use can includeadditional or fewer subsystems. In addition, bus 114 is illustrative ofany interconnection scheme serving to link the subsystems. Othercomputer architectures having different configurations of subsystems mayalso be utilized.

FIG. 2A is a block diagram illustrating an embodiment of an analyticmodel engine system. Some of the blocks shown in FIG. 2A are virtualizedexecution environments, for example Docker containers, hosted by one ormore computer/server systems in FIG. 1 on premises or in a cloudinfrastructure. In one embodiment, an engine (202) is a containercomprising an analytic model and providing a portable and independentlyexecutable microservice, for example via a web service with RESTful API.The engine (202) is coupled to an input stream (204) for data input andan output stream (206) for data output including scoring the analyticmodel based on the data input.

The engine (202) is coupled to a connect container (208), which providesdiscovery service for the engine (202) and other containers, for exampleto establish a system of determining the IP address of the engine (202)for contact via the RESTful API. The connect container (208) is coupledto a model manager database (210) to store abstractions as staticdescriptions comprising: models, schemas, I/O descriptors, sensors,model environment abstractions, engine environment abstractions, and/ormodel I/O tests. These descriptions are referred to herein as “static”in the sense that they are a configuration bound to a file prior to orduring run-time. The connect container (208) is also coupled to a fleetcontroller container (212) that binds a description in the model manager(210) to run-time abstractions in the engines (202) and orchestratescommunication between users and engines and between engines, for examplevia an SDK (“software development kit”), a CLI, and/or a dashboard GUI.

Utilizing the fleet controller (212), model manager (210) and/or connect(208) containers, a pipeline of engines may be established, here shownto connect the output stream of the engine (206) to an input stream(214) of a second engine (216). Within the system of FIG. 2A as manyengines may be statically designed or dynamically spun up, and given theinteroperable and ubiquity of containers, for example Docker containers,in cloud computing virtually an unlimited number of engines may be used,here shown up to an nth engine (218).

FIG. 2B is a block diagram illustrating an embodiment of an analyticmodel engine. In one embodiment, the system of FIG. 2B is represented byengines (202), (216), (218) in FIG. 2A. In one embodiment, the engine isa Docker container, but without limitation any other VEE may be used,for example a VMWare virtual machine.

Engine (222) is associated with an analytic model. An analytic modelabstraction (224) is used to abstract a given analytic model,comprising:

-   -   1. The model's programming language, for example, Java, Python,        C, or R;    -   2. One or more input schema for the model, wherein the input        schema comprises: one or more data types, constraints, and data        science guards. In one embodiment, a schema is language neutral        and may be expressed in a declarative language and/or execution        neutral language such as JSON, XML, and/or Avro. This schema may        include a specification for: a data type such as integer, float,        and/or Boolean; constraints such as non-negative values and/or        positive values; and data science guards such as a specification        of a probability distribution and/or a standard deviation with        tolerances.    -   3. One or more output schema for the model, wherein the output        schema comprises: one or more data types, constraints, and data        science guards; and    -   4. Language specific execution code points, wherein the code        points at a minimum comprise: an action to start scoring and/or        computing; and a yield/emit to start outputting score results.        This includes specifying an appropriate/language-specific code        point for execution referred to herein as an “action”. Examples        of an action include the main( ) loop in C, the code between the        keyword action and keywords yield/emit in R, and so forth.

For the given analytic model (226) bound to the abstraction (224),allocation is made within the container (222) for a model runner (228)responsible for providing an execution environment for the languagespecified by the model (226) in its abstraction (224) and/or aninspection of the model (226). For example, if the language is C, themodel runner (228) may include appropriate C libraries and dependenciesfor code execution, and if the language is Python, the model runner(228) may include the Python 2 or Python 3 interpreter with theappropriate Python packages. Allocation is also made for model statestore (230) within container (222) based on inspection of the model(226).

Engine (222) is associated with one or more input streams. An I/Odescriptor abstraction (232) is used to abstract a given input stream,comprising:

-   -   1. A schema, comprising: one or more data types, constraints,        and data science guards; and    -   2. An operating configuration, comprising: a transport type, one        or more endpoint specifics, and an encoding. One example would        be a transport type of a file in a filesystem, an endpoint        specific being a file descriptor, and an encoding being JSON.        Another example would be a transport type of a Kafka stream, an        endpoint specific being a Kafka bootstrap server configuration,        and an encoding being AVRO binary.

The abstraction (232) is bound to an input stream (234) and coupled toan appropriate input port (236) in the engine (222). Likewise an outputis associated with another I/O descriptor abstraction (232), bound to anoutput stream (240), and coupled to an appropriate output port (242) inthe engine (222). One or more internal/hidden blocks such as a manifold(244) may provide services comprising:

-   -   1. a schema checker to compare the input schema from the        analytic model abstraction (224) which may be provided by a data        science user with the input I/O descriptor abstraction (232)        which may be provided by a devops user; and    -   2. serializiation and deserialization and/or encode/unencode a        stream (234), (240), for example to deserialize for a Python        object.

In this embodiment, the engine (222) has two primary functions: first,it binds an analytic model (226) to an abstraction (224) and streams(234),(240) to an I/O descriptor abstraction (232) using a consistencycheck on the appropriate input and output schemas in any order ofbinding; second, it executes the model by invoking the model runner(228).

FIG. 2C is a block diagram illustrating an embodiment of an analyticmodel binding. In one embodiment, the system of FIG. 2C is representedby model binding (226) in FIG. 2B and execution is served by modelrunner (228) in FIG. 2B. After the data science follows the design rulesin crafting the analytic model algorithm in a specific programminglanguage and binds it to abstraction (224), the system interprets thecode, inspecting the model code for code snippets that are associatedwith, for example: action, emit/yield, whether state is savedexternally, concurrency flags, and whether state is shared. After themodel execution framework is interpreted and/or inspected, the systemgenerates a VEE such as a Docker container for the analytic engine (222)to include executable code (252) to implement the analytic model asshown in FIG. 2C.

Language specific engines comprising compilers/interpreters, toolchains,debuggers, profilers, development environments, libraries and/ordependencies are instantiated to permit a flexible selection ofprogramming languages including Python (254), C (256), R (258), and Java(260), and an ellipses is used in FIG. 2C to indicate there may be otherlanguages not shown in FIG. 2C such as PFA, MATLAB, and SAS.

In FIG. 2C the analytic model code itself to be executed is referred toherein as the model action context (262) and an associated model state(264) is used consistent with the above inspection/interpretation ofstate initialization and state management. Stream processor (266) is acomponent that may perform serialization and deserialization of inputdata (268) and output data (270). Stream processor (266) may comprisedrivers for different data sources and for different outputdestinations. Stream processor (266) may provide data type safetymechanisms like a computer science guard to enforce/flag data typingsuch as integers, constraints. Stream processor (266) may provide datascience guards to enforce/flag statistical properties such as standarddeviation.

The model (252) may include one or more sensors (272), components thatdebug and/or profile the model. A sensor may allow a user to instrument,for example, how much memory is being used in the action context (262),how much memory is being used in the stream processor (266), how manyCPU cycles are being used by math in the action context (262), how manyCPU cycles are being used serializing a data stream (266), how many CPUcycles are being used deserializing a data stream (266), and so on. Forexample, a sensor may facilitate continuous tuning. A sensor may bedynamically configurable such that they may be added, activated,reconfigured, deactivated, and/or removed at run-time via an API, forexample an RESTful API.

An example of a sensor (272) for engine instrumentation comprises adescription of: what to measure; sampling intervals; and outputintervals, for example:

{   “Tap”: “sys.memory”,   “Activate”: {     “Type”: “regular”,    “Interval”: 0.5   },   “Report”: {     “Interval”: 3.0   } }

The model (252) may include a web server (274), providing microservicesover HTTP via an API, for example a RESTful API. The system in FIG. 2A,FIG. 2B, and FIG. 2C may in one embodiment have a standard API and/orRESTful API to allow a consistent programmatic approach to interfacingwith various components in the system.

FIG. 2D is a block diagram illustrating an embodiment of an analyticmodel engine infrastructure representation. In one embodiment, theengine (222) of FIG. 2B are represented in plurality in FIG. 2D byengine A (222 a), engine B (222 b), and to any number M for engine M(222 m).

In various embodiments and without limitation, a VEE used for an engine(222) is a Docker and/or Linux LxC container. A containerized engine(276) is used to provide the operating-system level virtualization ofeach container. Containerized engine (276) may include other containerservices such as container deployment, scaling, clustering, andmanagement, for example via Kubernetes, Mesos, and/or Docker Swarm.

In the case of a virtualized operating-system paradigm, the host OS(278) of on-premises hardware and/or off-premises cloud resources mayinclude core stacks and dependencies available across all engines andmanagement containers. This may include a core networking stack (280)with universal protocols such as TCP/IP and/or internal protocols usedfor the system such as Kafka. Similarly, a core filesystem stack (282)may be used and other core packages, libraries, and/or dependencies(284) may be installed in the host OS (278).

FIG. 2E is a block diagram illustrating an embodiment of a fleetcontroller component. In one embodiment, the fleet controller (212) ofFIG. 2A is represented in FIG. 2E.

The fleet controller component provides a plurality of interfaces to auser and/or software developer to interface with the analytic engineecosystem. A fleet controller (212) binds high-level static descriptionsin the model manager (210) to the analytic engine run-time abstraction(222). In one embodiment, the fleet controller (212) may also bindcontainers with infrastructure or provide an interface to orchestratecontainers and infrastructure using a third-party tool such asKubernetes, Docker Swarm, and/or Mesos. There are at least threeinterfaces available for the fleet controller (212):

-   -   1. A fleet controller SDK (286) that permits a software        developer to integrate binding services to their own operational        environment. In this case, the model manager (210) may be        bypassed and/or unused and the software developer may use their        own store for descriptions. The SDK (286) may utilize a RESTful        API to communicate with the engines (222) and/or other        components of the analytic engine ecosystem;    -   2. A CLI (288) that permits shell scripting of binding services        for automation and integration for an analytic ops user. The CLI        (288) may encapsulate and simplify using the same exposed        RESTful API that the SDK (286) would use; and    -   3. A dashboard user interface (290), for example a GUI, for an        analytic ops user to operate via, for example, a browser. The        dashboard (290) may encapsulate and simplify using the same        exposed RESTful API that the SDK (286) would use.

In various embodiments and without limitation, a VEE is used for the CLIserver (288) and/or dashboard server (290) such as a Docker and/or LinuxLxC container.

FIG. 2F is a block diagram illustrating an embodiment of a system for ananalytic model ecosystem.

As described above, data science users provide models (252) while devopsusers and/or IT users data streams such as data sources (234) and sinks(240) for use with the model engine (222).

The analytics ops specialist may use the fleet controller (212) forexample a dashboard (290) via a browser to then interface the model(252), streams (234, 240), and engine (222) with an analytic modelecosystem, comprising:

-   -   1. Model management (210) for example a model manager (210)        component/container as described above;    -   2. Model inspection (292) for example a component/container        dedicated to inspecting models to identify programming language        and conformity with design rules;    -   3. Model deploy (293) for example a component/container to        regulate and scale model deployment as described below;    -   4. Model compare (294) for example a component/container to        compare two or more models in various techniques, for example a        champion challenger approach; and/or    -   5. Model verify (295) for example a component/container to aid        in model verification.

FIG. 3 is a block diagram illustrating an embodiment of a system for ananalytic engine pipeline. In one embodiment, each of the enginesdepicted in FIG. 3 are the engines (222) shown in FIG. 2B.

As an example and without limitation, an analytics op specialist deploystwo sets of five-engine pipelines: comprising engine 1, engine 2, engine3, engine 4, and engine 5, each type of engine being identical for thetwo pipelines for example engine 2 (310 a) is identical to engine 2 (310b), engine 3 (312 a) is identical to engine 3 (312 b), and engine 4 (314a) is identical to engine 4 (314 b). One pipeline (302) is deployed in aGCP cloud wherein a GPGPU (general-purpose graphics processor unit)instance is available for numerical analysis. The other pipeline (304)is deployed in an Azure cloud where no such GPGPU instance is availablefor processing.

At the start of this example, pipeline (304) uses the stream routing asshown with dot-dash lines (322) and (324) to utilize engine 3(312 b).Sensors (272) are deployed in each of the associated models with engines(310)-(314) and it is noted that the performance of engine 3 (312 b) islower than that of engine 3 (312 a). The model associated with engine 3(312) is further noted to be computationally intensive and significantlyfaster with GPGPU processing.

At run-time, and without any reprogramming of the analytic model and/orparticipation of a data science user, a separate user may use theRESTful API in conjunction with the fleet controller (212) to rerouteengine 2(310 b) from engine 3(312 b) to instead engine 3(312 a) in theGCP cloud with a GPGPU instance, shown in the solid line (332) and(334).

With the model abstraction, stream abstraction, and containerabstraction, changing the infrastructure, for example from Azure to GCP,of each unit such as an engine, is facilitated. For example, a user mayuse the dashboard (290) to specify:

-   -   1. Spinning up a second instance of engine 3 (312 a) in GCP;    -   2. Routing the output stream of engine 2 (310 b) to the input        stream of newly instantiated engine 3 (312 a) via route (332);        and    -   3. Routing the output stream of newly instantiated engine 3 (312        a) via route (334).

FIG. 4A is a block diagram illustrating an embodiment of a system for ananalytic engine ecosystem. In one embodiment, the analytic engine system(402) is an example system as shown in FIG. 2A with a typical usageusing a dashboard (212),(290) coupled to a user's browser (404).

The disclosed analytic engine system (402) is designed to interface witha plurality of data science tools (412) including, for example, thefollowing: C, C++, C#, Python, Java, R, S, SAS, PFA, PrettyPFA, H2O,PMML, SPSS, Julia, Magma, Maple, Mathematica, and MATLAB. In thisexample, libraries and/or packages such as NumPy for scientific andnumeric computing, BLAST for bioinformatics, and/or TensorFlow formachine learning are supported. Data science development environmentssuch as Jupyter, IPython, Sage, Apache Zeppelin, and Beaker may beintegrated via a plug-in and/or other interface.

The disclosed analytic engine system (402) is designed to interface witha plurality of existing data sources (414) including, for example, thefollowing: Apache Spark, Hadoop, Amazon Redshift, Azure SQL DataWarehouse, Microsoft SQL Server, Teradata. The disclosed analytic enginesystem (402) is designed to interface with a plurality of front-facingapps and dashboards (416) including, for example, the following:Guidewire, JSON, Llamasoft, Grafana, Domo, and Tableau.

With its modular approach, the system (402) is designed to work withexisting IT ecosystems and/or tooling including, for example, thefollowing: microservices orchestration such as Kubernetes, DC/OS,Cloudfoundry; microservices messaging such as Kafka and Confluent;microservices monitoring such as Sysdig and Prometheus; microservicesnetworking such as Weave and Project Calico; microservices foundationssuch as Docker; operating systems such as CoreOS, Ubuntu, Redhat, Linux,Windows, and MacOS; and infrastructure such as on-premises,off-premises, AWS, Azure, and GCP.

Fastscore Technical Architecture of a Platform for DEPLOYING ANALYTICENGINES.

An example technical architecture of a platform for deploying analyticengines (222) will now be described. The FastScore™ Microservices Suite(“FastScore”) is a platform for predictive analytics over data, whichare in standard forms including message bus microservices streams (e.g.,Kafka), distributed memory systems (e.g., Spark and/or Storm),distributed disk systems (e.g., Hadoop), object stores (e.g., AmazonS3), and other data storage systems. A fundamental unit of work of theplatform is an analytic deployment engine (222), which is also referredto herein as a scoring engine or an analytic engine. The analyticdeployment engine (222) is designed to execute analytics generallygenerated from PFA and/or analytic models (e.g., models) from otherlanguages, such as PrettyPFA, R, Python, Julia, and/or other languages.Models that are expressed in PFA before deploying and executing inFastScore generally have a higher level of safety and visibility.

For example, one example architecture for FastScore provides for thefollowing technical features for deploying analytic engines (222) andproviding a platform for predictive analytics over data:

-   -   1. Scales to tens of thousands deployed models and hundreds of        concurrent users;    -   2. Executes models with or without streaming frameworks (e.g.,        Hadoop, Spark, Storm);    -   3. Provides for convenient authoring and translation/conversion        of PFA/other models;    -   4. Provides detailed instrumentation and visibility of running        analytic engines including:        -   a. Model details generated from PFA static analysis;        -   b. JVM details from compiled and running PFA models;        -   c. System performance metrics; and        -   d. Data science metrics derived from apply PFA models to            data sources; and    -   5. Avoids manual configuration whenever possible.

For example, the disclosed architecture allows for push buttondeployment and continuous monitoring for analytics. Data science teamsmay rapidly iterate on their models without IT implementation delays(e.g., update their analytic models, reconfigure their analytic models,test/debug their analytic models, and/or perform AB testing of theiranalytic models). Production analytic teams may monitor and tuneanalytics after they are deployed based on performancemeasurements/monitoring (e.g., instrument their deployed analytic modelsand dynamically scale their deployed analytic models based onmonitoring/measurements using sensors). In an example deployment, IT mayleverage a Docker® container ecosystem to manage deployment of any datascience asset.

In an example implementation, FastScore is a secure PFA-based solutionfor predictive analytics. FastScore may be deployed to a variety ofsoftware platforms. For instance, FastScore may run as a Linux process,or as a standalone Mesos task, or as a ‘bolt’ of a Storm graph. Aparticular deployment of FastScore may use a unique mix of technologiessuch as described herein (e.g., Kafka or NSQ for a message queue, Sparkor Storm for a streaming framework, Mesos or YARN for a clusterscheduler, etc.), and may be selected based on design choices ofenterprises/customers or for other performance/cost reasons.

These and other aspects of the disclosed architecture for FastScore arefurther described below.

Microservices.

In one embodiment, each component of the FastScore platform is deliveredas a microservice. This microservice architecture approach favorsdeployment flexibility, increased ability to automate with standardsystem level tools (e.g., Docker or similar tools), as well as theability to scale services and compose new services with systemadministrators and DevOps professionals rather than specializeddevelopers and tools. For example, the FastScore microservices maygenerally follow a FastScore Microservice Pattern embodiment with astack of:

-   -   1. Core Service Libraries and Programs; on top of    -   2. FastScore APIs and Interfaces; on top of    -   3. Communication Services; on top of    -   4. Docker Linux.

Referring to the above, each of the FastScore microservices and/orcomponents will now be described providing example implementationdetails for each of these microservice layers.

Core Service Libraries and Programs:

Code at this layer provides the core semantics and functionality of themicroservice. For example, an Engine microservice may provide aPFA-based score service by embedding a Hadrian PFA JIT compiler at thislevel.

FastScore APIs and Interfaces:

Code at this layer maps the core functionality to the communicationservices layer and embeds the specific semantics that define themicroservices. The demo servlet wrapper for Hadrian is an example ofcode at this layer, which maps specific Hadrian-based semantics toApache Tomcat.

Communication Services:

These are the third party libraries and systems that support thespecific communication schemes that clients use to interact with themicroservices. Different services may use different languages, systems,and libraries to implement this layer. In this example, given that theEngine microservice utilizes the Hadrian library which is written inScala, Tomcat may be used to provide the REST API semantics at thislevel. A Model Manager microservice on the other hand may use Cowboy(e.g., or another HTTP server implementation) at this level.

Docker Linux:

Docker may be utilized as the unit of deployment for all FastScoremicroservices. In this way, FastScore microservices may be composed,deployed and managed using standard schedulers and other tools in theDocker eco-system, such as Kubernetes, Swarm, Apache, Mesos, and/orother commercially available and/or open source schedulers/tools for theDocker eco-system. In this example, the only restriction in packagingthe microservice into the Docker container is that it generally does nothave any reliance on special kernel modifications, because each Dockercontainer shares the Linux kernel of the host OS.

One example of the FastScore microservice patterns for the EngineMicroservice (222) for executing analytic engines is a stack:

-   -   1. Hadrian; on top of    -   2. API/Interfaces; on top of    -   3. Tomcat and Kafka; on top of    -   4. Docker Linux, to be coupled with REST clients, for example        through a RESTful API, and also to be coupled with Kafka peers.

An example of the FastScore microservice patterns for the Model ManageMicroservice (210) for managing deployed analytic engines is a stack:

-   -   1. Mnesia; on top of    -   2. API/Interfaces; on top of    -   3. Cowboy; on top of    -   4. Docker Linux, to be coupled with REST clients, for example        through a RESTful API.

An example of the FastScore microservice patterns for the Model InspectMicroservice (292) for inspecting analytic models is a stack:

-   -   1. Titus; on top of    -   2. API/Interfaces; on top of    -   3. Django; on top of    -   4. Docker Linux, to be coupled with REST clients, for example        through a RESTful API.

Unit Performance and Complexity at Scale.

Performance is generally reduced for any given instance of amicroservice due to the variety of I/O boundaries. Scaling amicroservices architecture may lead to management and operationalcomplexity. In order to deal with both of these potential complexitiesand/or tradeoffs, an automation framework may be provided to maintainoperational overhead constant relative to scale and may allow scale toovercome the per container performance penalties by allowing concurrentexecution and I/O paths.

User Interface (UI): Dashboard Microservice.

The Dashboard microservice (290) may also follow the FastScoremicroservice pattern. An example of the FastScore microservice patternsfor the Dashboard Microservice is a stack:

-   -   1. Material, Angular, and/or other UI framework; on top of    -   2. API/Interfaces; on top of    -   3. HTTPD and/or NGINX; on top of    -   4. Docker Linux, to be coupled with browser clients.

In an example deployment, the disclosed and/or backend microservicescomponents of FastScore may be deployed and executed in the cloud ordedicated servers in an enterprise data center.

The FastScore microservices may be implemented to support RESTcommand-line (CLI) interfaces. For example, some of the FastScoremicroservices may support interfaces using graphical user interface(GUI) functionality within a Dashboard. The CLI commands may be madeself-explanatory and may correspond closely to REST API calls.

Backend Microservices.

In this example, there are two backend microservices/components: aFacade and an Engine. As described herein, a “facade” (in lowercaseletters) or an “engine” (in lowercase letters) is used to refer to aninstance of the corresponding microservice/component. A FastScorecluster may have many facades and many engines. Bothmicroservices/components may be packaged as Docker images. Bothmicroservices/components may run as a Linux process or as a task undercluster schedulers, such as using Marathon/Mesos, and/or othercommercially available and/or open source schedulers for the Dockereco-system.

There are several standard components FastScore generally utilizes tofunction. An example list of such components includes Kafka and/or NSQ(e.g., providing a message queue), MySQL (e.g., providing a database),and HAProxy (e.g., providing a load balancer). FastScore generally doesnot depend on a particular message queue or a database. The FastScoreplatform may be implemented to flexibly support a variety of existingand/or new message queues or databases.

RESTful APIs.

The FastScore platform may be implemented to support various interfacesbetween components/microservices. For example, RESTful APIs are providedto support communications between components/microservices of theFastScore platform (e.g., over HTTPS and/or other network communicationprotocols).

Example System Component Architecture.

FIG. 4B is an example system component architecture of the FastScoreplatform. As shown, the FastScore platform may include a plurality ofengines (222), Connect component/microservice (208), Model Managercomponent/microservice (210), and a Dashboard component/microservice(290) and/or Fleet Controller (212).

Connect Microservice.

Referring to FIG. 4B, the Connect component/microservice (208) is thecentral communication point of a multi-microservice FastScoreimplementation. In this example, Connect (208) publishes a REST APIaccessible by UI (e.g., Dashboard). Any communication between UI and thebackend can be performed through an instance of the Connect microservice(208). In some cases, a FastScore implementation may use a load balancer(422) in front of a fleet of Connect instances for workload balancingacross the fleet of Connect instances (208).

For example, the Connect microservice (208) may be configured togenerally accepts requests from other components/microservices of theFastScore platform and typically translates them into calls to an API ofa standard component, such as MySQL, Kafka, and/or anothercomponent/microservice. Connect instances (208) may be implemented asstateless and thus interchangeable. As such, a load balancer (422) suchas HAProxy may choose a different Connect instance (208) to serve thenext request from the same client.

Furthermore, other components of the ecosystem may be added or developedincluding an integrated development environment (424) such as Jupyter,cluster scheduler (426) such as Marathon, Mesos and/or YARN, distributeddisk system (428) such as Hadoop, distributed compute/memory system(430) such as Spark and/or Storm, message bus stream /queue (432) suchas Kafka, database (434) such as MySQL, and distributed object store(436) such as S3.

A Connect instance (208) may be implemented to use a database forpersistent data and a message queue for transientasynchronously-delivered data. Other components may access the databaseand the (system) message queue through a Connect API. In this exampleimplementation, engines (222) may access message queues that containinput data directly. The persistent state may include user profiles,their models, and stream descriptors.

User Management.

In an example implementation, calls to a facade are generally requiredto be authenticated. As a convenience, a user may login to the FastScoreplatform permanently using the CLI built into Connect (208) to providetheir user credentials (e.g., username and password for their FastScorelogin), which may then store the user's credentials on their localmachine (e.g., local computing device) and subsequent FastScore commandsmay be configured to use them by default (e.g., and does not require theuse of session cookies, in which calls can be checked using the BasicHTTP authentication method).

Stream Management.

As similarly described above, the FastScore Engine (222) operates ondata streams. Users may specify/describe data streams (e.g.,input/output schemas may be utilized as a data stream descriptor) andlater use them to execute their analytic models. For instance, a datastream descriptor may be a JSON, AVRO, and/or another encoding/resource,and its contents generally depend on the nature of the data stream, suchas provided in the below example.

{  “description”: “CME Incremental Refresh Messages”,  “connect”: {  “type”: “kafka”,   “topic”: “cme-incremental-updates”  } }

A user may manipulate stream descriptors as follows:

fastscore stream add <stream-name> <str1.json> fastscore stream listfastscore stream show <stream-name> fastscore stream remove<stream-name>

In some cases, utility commands may be provided with the platform tohelp users write stream descriptors. Examples of such utility commandsmay include the following examples:

fastscore stream sample <stream-name> # get sample data from streamstream fastscore stream kafka-topics # list topics from local Kafka

FIG. 5A is an example of stream-related requests involving a series ofroundtrips to a database through a Connect microservice.

Model Management.

In this example implementation, model management is similar to themanagement of stream descriptors as shown in the following, indicated bycoupling 1, coupling 2, and coupling 3 in FIG. 5A:

fastscore model add <model1.json> fastscore model list fastscore modelshow <model-name> fastscore model remove <model-name>

A model may have different representations. The canonical representationmay be implemented using a JSON document, a PrettyPFA representation, oranother representation. All requests and responses related to model maygenerally use a correct content type as given in the table below:

Model format Content-Type Note JSON application/vnd.pfa+json PrettyPFAapplication/vnd.ppfa

In this example, the facade can derive the content type from theextension of the requested resource. For example, GET/1/models/<model-name>.ppfa requests a PrettyPFA representation of themodel.

Asynchronous Notifications.

FIG. 5B illustrates an example wherein a facade manages asynchronousnotifications from running engines using web sockets. The notificationmay include log messages and engine outputs. A UI component maygenerally be ready to reconnect a web socket upon error as the facademay restart at any time.

The facade itself receives the notifications it posts to the web socketfrom a (system) message queue. Engines are generally not connected to a(system) message queue directly. They use a facade to post log messagesand model outputs to the (system) message queue under a topic named/<user-name>/notifications.

To receive asynchronous notifications, a user may issue a fastscorewatch command or wait for completion of a model using fastscore modelwait <model-name>. The GUI may have the asynchronous channel open at alltimes.

Startup.

In this example implementation, when launched, a facade generally has noprior knowledge of its surroundings. The Docker image is the same forall facades and no configuration is typically passed to it. A facade mayuse autodiscovery to connect to other components, and/or may also findand register with a load balancer (422) (if any). The facade may alsouse autodiscovery to locate a database (434) and a (system) messagequeue (432).

The facade may also automatically discern between running as astandalone process or as a task under control of a scheduler (426), suchas using Marathon/Mesos and/or YARN, and/or other commercially availableand/or open source schedulers for the Docker eco-system.

FastScore Engine.

The traditional use of the term ‘engine’ is the scoring engine generatedby the FastScore compiler. As described throughout this specification,the term ‘PFAEngine’ is used instead to refer to that traditional use ofthe term ‘engine’, and the term ‘engine’ is used throughout thisspecification to instead refer to a component of the FastScore platformthat generates a PFAEngine from a model, connects it to inputs, andcollects its outputs. Thus an engine may comprise a PFAEngine within.

Configuration and Startup.

In an example implementation, each of the engines use an identicalDocker image. The configuration is passed to an engine using environmentvariables, such as the variables shown in the below table.

Variable Description FASTSCORE_USER A user name FASTSCORE_MODEL A modelname FASTSCORE_INPUT An input stream name

In this example, each of the variables values are references, not actualvalues. The engine retrieves the model and stream descriptors from adatabase (e.g., using a facade).

An example of JSON resource that describes the engine for Marathon/Mesosand/or another scheduler:

{  “cmd”: “launcher”,  “cpus”: 0.5,  “mem”: 256.0,  “instances”: 3, “container”: {   “type”: “DOCKER”,   “docker”: {    “image”:“fastscore/engine”,    “network”: “HOST”   }  },  “env”: {  “FASTSCORE_USER”: “mk”,   “FASTSCORE_MODEL”: “orderbook-demo.ppfa”,  “FASTSCORE_INPUT”: “cme-mdp”  } }

Running Models.

A user can start/stop engine(s) using the following commands:

fastscore job run <model-name> <stream-name> fastscore job listfastscore job scale fastscore job status <job-id> fastscore job output<job-id> fastscore job stop <job-id>

In this example, each job may include many running engines. Jobs maygenerally not be kept in the database. Jobs may be reconstructed uponrequest using services of the underlying computing platform.

Specifically, an engine decodes the input stream descriptor, connects tothe corresponding source, and starts feeding data to a PFAEngine. Insome cases, re-running the streams may be performed when debugging themodel.

Logging and Outputs.

FIG. 5C illustrates collecting outputs of an instance of an engine andposting them to the message queue. A similar processing is performed forlog messages.

In an example implementation, the FastScore platform including theEngine (222), Connect (208), Dashboard (212, 290), and Model Manager(210) components/microservices may be implemented using the Scalaprogramming language and/or other programming languages may be used forimplementing the FastScore platform.

Debugging API.

Debugging may utilize a RESTful API published by an engine.

fastscore engine list

fastscore engine stop

fastscore engine next

fastscore engine inspect cell

fastscore engine inspect pool

fastscore engine resume

Auto-Discovery.

In this example, a component/microservice automatically finds out how toconnect to other components. In addition, a component/microservice isgenerally ready to reestablish the connection when an error occurs usingthe same discover process.

Consider the following example. Assume that a facade has to connect to adatabase (434). It has a list of databases it understands, such as MySQLand Mongo DB. Assuming that MySQL is preferred, the search may startwith MySQL. For MySQL, there may be a list of methods to discoverlocation. It may query Mesos-DNS first, then it may look for an AWSinstance tagged, or it may try to connect to the TCP port 3306 on thelocal machine. If all fails, it may then move on to Mongo DB. Inaddition, the facade may ensure that the database is accessible andcontains expected data. Discovery for FastScore components/microservicesis further described below.

Discovery for Microservices.

In some cases is may be desirable to reduce/minimize the manualconfiguration of microservices and to have them configured to performautodiscovery in order to discover their environment using well-definedmechanisms. An example of such a mechanism is Mesos-DNS, which is astandard component of a Marathon/Mesos cluster. As an example, ModelManager (210) is a FastScore microservice for storing/loading PFA/PPFAmodels. The PFA/PPFA model may be automatically translated and/orconverted from other models in R, Python, C, and/or Java. Othermicroservices that depend on Model Manager (210) may find it in thecluster as described below. Model Manager (210) itself will generallydepend on a database and will have to find, for example, a MySQL (434)instance upon startup. In another example implementation, the ModelManager (410) microservice can store models in memory and, thus, doesnot depend on a database.

Starting a Microservice.

Below is an example for starting a microservice.

{   “id”: “model-manage”,   “container”: {    “type”: “DOCKER”,   “docker”: {     “image”: “maximk/model-manage:latest”,     “network”:“BRIDGE”,     “portMappings”: [     { “hostPoart”: 0, “containerPort”:8443 }     ],     “forcePullImage”: true   }  },  “cmd”:“model_manage/bin/model_manage foreground”,  “instances”: 1,  “cpus”:0.1,  “mem”: 128 }

In this example, the microservice binds to the 8433 port inside thecontainer and this port is mapped by Marathon to a random host port. Thepart of a given task is to discover the value of this randomly assignedhost port.

Model Manager (210) is now ready to start on the cluster using the belowcommand:

$ dcos marathon app add model-manage.json

In a few moments a Model Manager (210) instance will start in thecluster.

Suppose XYZ, which is a microservice that depends on Model Manager(210), is started. XYZ may directly use model-manage.marathon.mesos as ahost name of Model Manager.

Finding out the port number may introduce more complexity. Mesos-DNSexposes a REST API at a well-known location: http://master.mesos:8123. Afew examples are provided below.

$ curl http://master.mesos:8123/v1/version { “Service”: “Mesos-DNS”,“URL”: “https://github.com/mesosphere/mesos-dns”, “Version”: “v0.5.1” }$ curl http://master.mesos:8123/v1/hosts/model-manage.marathon.mesos [ {“host”: “model-manage.marathon.mesos.”, “ip”: “10.0.5.183” } ]

A more complex call may return both the ip address and the port number:

$ curl http://master.mesos:8123/v1/services/_model-manage._tcp.marathon.mesos [ { “service”: “_model-manage._tcp.marathon.mesos”,“host”: “model-manage-mirhe-s1.marathon.slave.mesos.”, “ip”:“10.0.5.183”, “port”: “6594” } ]

Thus, upon startup XYZ microservice attempts to gethttp://master.mesos:8123/v1/services/_{service}._cp.marathon.mesos foreach microservice it depends on and later use IP addresses and portnumbers returned by these calls. Note that Marathon may redeployservices to other hosts if they crash or change configuration (e.g.,microservices can refresh information using Mesos-DNS if an endpointstops responding).

Fastscore Model Inspection.

Model inspection (292) comprises the intake of data science models (252)from a data science user. To continue the above example, FastScore is aan embodiment of a streaming analytic engine: its core functionalitycomprises reading in records from a data stream, scoring them, andoutputting that score to another data stream. Throughout thisspecification a ‘score’ is used in the data science sense as any outputand/or prediction of a data science model, and ‘scoring’ is used torefer to the process of using a model to produce output and/orpredictions.

As such, running any model consists of four steps:

-   -   1. Loading the model;    -   2. Configuring input and output streams;    -   3. Setting Engine parameters; and    -   4. Running the model.

Creating and Loading Models.

FastScore support models in many languages including Python, R, Java,PFA, PrettyPFA, and C. As a model interchange format, PFA providesbenefits in performance, scalability, and security, and PrettyPFA is ahuman-readable equivalent to PFA. R and Python are typically inwidespread use amongst data science users.

Models Via CLI.

The FastScore CLI (288) allows a user to load models directly from thecommand line. The list of models currently loaded in FastScore may beviewed using the model list command:

$ fastscore model list Name Type ------ ------ MyModel Python

Models may be added with model add <name> <file>, and removed with modelremove <name>. Additionally, the fastscore model show <name> commandwill display the named model.

Models Via Dashboard.

The Dashboard (290) provides functionality to add and manage models viaa GUI. To upload a model, an “upload model” widget may be used to choosea model from a local machine. Alternatively, a “select model” widgetallows you to select an existing model from the Model Manager (210) byname. Additionally, models may be added, removed, inspected, and editedwithin the Model Manager (210) GUI.

Design Rules for Models in Python and R.

All models may be added to FastScore and executed using the same CLIcommands, namely:

fastscore model add <modelname> <path/to/model.extension>Note that, in order to determine whether a model is Python or R, in oneembodiment Engine-X uses file extension (.py for Python, .R for R, .pfafor PFA, and .ppfa for PrettyPFA). Also, in order to score a Python/Rmodel, there are design rules and/or certain constraints on the form fora given model.

FastScore comprises both a Python2 and Python3 model runner (228). Inone embodiment, by default, .py files are interpreted as Python2models—to load a Python3 model, use the file extension .py3, or theflag-type:python3 option with fastscore model add:

fastscore model add-type:python3 my_py3_model path/to/model.pyto add a Python3 model.

Design Rules for Python Models.

As a design rule example, Python models should declare a one-argumentaction ( ) function. The minimal example of a Python model is thefollowing:

model.py # fastscore.input: input-schema # fastscore.output:output-schema def action(datum): yield 0

This model will produce and/or score a 0 for every input. Additionally,Python models may declare begin( ) and end( ) functions, which arecalled at initialization and completion of the model, respectively. Aslightly more sophisticated example of a Python model is the following:

model.py # fastscore.input: input-schema # fastscore.output:output-schema import cPickle as pickle def begin( ): # perform anyinitialization needed here     global myObject   myObject =pickle.load(open(‘object.pkl’))   pass # or do something with theunpickled object def action(datum): # datum is expected to be of theform ‘{“x”:5, “y”:6}’   record = datum   x = record[‘x’]   y =record[‘y’]   yield x + y def end( ):   pass

This model returns the sum of two numbers with the design rule of usingyield. Note that FastScore supports the ability to import Python'sstandard modules, such as the pickle module. Non-default packages mayalso be added using an import policy, as described below. Custom classesand packages may be loaded using attachments.

R Models.

R models feature similar functionality as Python models, as well as thesame constraint for a design rule: the user defines an action functionto perform the actual scoring. For example, the analogous model to thePython model above is.

R # fastscore.input: input-schema # fastscore.output: output-schema #Sample input: {“x”:5.0, “y”:6.0} action <- function(datum) {  x <-datum$x  y <- datum$y  emit(x + y) }

As a design rule, R uses emit for output.

Java Models.

Models written in the Java language supported includes the followingexample types of models:

Generic Java code;

POJOs exported from H2O; and/or

Spark MLLib models.

Generic Java Models.

A generic Java model may execute arbitrary Java code. In order to runthis model in the example FastScore architecture, it may implement aparticular model interface: the IJavaModel interface. This interfaceincludes design rules for begin, action, and end methods, analogous toPython and R models.

Generic Java Model  import fastscore.IJavaModel; public class MyModelimplements IJavaModel {  public void begin( )  {   ...  }  public Stringaction(String datum)  {   ...  }  public void end( )  {   ...  } }

H2O Models.

Although an H2O model may be structured as a generic Java model, theexample FastScore architecture also provides a convenience feature toallow direct import of H2O models. In order to use this feature, thefollowing steps may be taken:

-   -   1. A model should be saved as POJO. Without further        modifications, this exported POJO may be used as the model code        in FastScore; and    -   2. To load the model in FastScore, the model name should match        the exported POJO class name, and be explicitly specified as        model type “h2o”:        fastscore model add gbm_pojo_test gbm_pojo_test.java-type:h2o

When running H2O models, FastScore may output the original input recordappended with an additional “Result” field that represents an array ofprediction results. For example, in H2O's GBM airlines sample model, theinput and output may be:

Input JSON Record:

{ “Year”: “2017”, “Month”: “06”, “DayofMonth”: “04”, “DayOfWeek”: “7”,“CRSDepTime”: “1030”, “UniqueCarrier”: “PS”, “Origin”: “SAN”, “Dest”:“ORD”}

Output JSON Record:

{“CRSDepTime”:“1030”,“Origin”:“SAN”,“Month”:“06”,“DayOfWeek”:“7”,“Dest”:“ORD”,“Year”:“2017”,“UniqueCarrier”:“PS”,“DayofMonth”:“04”,“Result”:[“YES”]}Note that the original order of the fields may not be preserved in theoutput record.

Spark MLLib Models.

The example FastScore architecture includes integrated Apache Sparklibraries that allow adding models that leverage Spark MLLib. Javaimport statements may be safely used for required Spark packages inmodel code.

A Spark model should follow the same design rules and/or conformanceguidelines as a generic Java model, and any previously saved modelfiles/folders, for example Parquet format, may be added as a modelattachment. In general, the model may perform Spark contextinitialization in the begin( ) method.

Below is an example Spark model that assumes that theLogisticRegressionModel was previously created and saved under thescalaLogisticRegressionWithBFGSModel folder and then uploaded toFastScore as an attachment.

MLLibLRModel.java import fastscore.IJavaModel; importorg.apache.spark.SparkConf; import org.apache.spark.SparkContext; importorg.apache.spark.sql.SparkSession; import org.json.JSONObject; importorg.json.JSONTokener; importorg.apache.spark.mllib.classification.LogisticRegressionModel; importorg.apache.spark.mllib.linalg.Vector; importorg.apache.spark.mllib.linalg.Vectors; public class MLLibLRModelimplements IJavaModel {   LogisticRegressionModel _lrModel;   publicMLLibLRModel( ) {     System.out.println(“MLLib Linear Regressionmodel”);   }   public void begin( ) {     SparkConf conf = newSparkConf( );     conf.setAppName(“ML Lib LR Model”);    conf.setMaster(“local”);     conf.set(“spark.driver.host”,“127.0.0.1”);     SparkSession spark = SparkSession.builder().config(conf).getOrCreate( );     SparkContext sc = spark.sparkContext();     _lrModel = LogisticRegressionModel.load(sc,“scalaLogisticRegressionWithLBFGSModel”);   }   public Stringaction(String datum) {     try {       Vector dv =Vectors.fromJson(datum);       double res = _lrModel.predict(dv);      JSONObject jsonObj = new JSONObject(new       JSONTokener(datum));jsonObj.append(“Prediction”, res);       return jsonObj.toString( );    } catch (Exception e) {       return e.toString( );     }   }  public void end( ) {   } }

To add this model to FastScore, the following commands may be run:

tar czvf scalaLogisticRegressionWithLBFGSModel.tar.gzscalaLogisticRegressionWithLBFGSModel/ fastscore model add MLLibLRModelMLLibLRModel.java fastscore attachment upload MLLibLRModelscalaLogisticRegressionWithLBFGSModel.tar.gz

JARs.

If a Java model requires one or more JAR files, they may be suppliedtogether with any other required files as a single ZIP or .tar.gzattachment. The Java runner (228) will add all supplied JARs into theclass path during compilation and runtime, so the model may safelyimport any required packages from these JARs.

Input and Output Schema.

FastScore may enforce strong typing on both the inputs and outputs ofits models using AVRO schema. For R and Python models, this typing isenforced by specifying schema names in a smart comment at the top of themodel file:

# fastscore.input: array-double# fastscore.output: double

Python and R models specify schemas for their inputs and outputs.PrettyPFA and PFA models already contain the input and output schema aspart of the model definition, so they may not require a schemaattachment.

For example, a model that expects to receive records of two doubles asinputs might have the following schema:

input.avsc {  “name”: “Input”,  “type”: “record”,  “fields”: [  {“name”: “x”, “type”: “double”},   {“name”: “y”, “type”: “double”}  ]}

The model might then produce a stream of doubles as its output:

output.avsc {  “name”: “Output”,  “type”: “double” }

Input and output schema may be uploaded separately to FastScore. Toupload the schema to FastScore with the CLI (288), the followingcommands may be used:

fastscore schema add input input.avscfastscore schema add output output.avsc

Attachments may also be managed from within the Dashboard (290), using aModel Manager (210) UI.

Input and Output Streams.

Before a model may be run, it generally should have some data to run on.Input and output streams may be used to supply the incoming data to themodel, and to return corresponding scores. Many types of streamtransports are supported including: file, Kafka, HTTP, TCP, UDP, ODBC,debug, and console streams. In one embodiment, each type may beconfigured using a Stream Descriptor file.

Stream Descriptors are small JSON files containing information about thestream. An example of a Stream Descriptor for a Kafka stream isdisplayed below:

JSON {  “Description”: “read Avro-typed JSON records from a Kafkastream”,  “Transport”: {   “Type”: “kafka”,   “BootstrapServers”:[“127.0.0.1:9092”],   “Topic”: “data-feed-1”,   “Partition”: 0  }, “Encoding”: “json”,  “Schema”: { type: “record”, ... }

An example type of stream to use is a file stream, which reads or writesrecords directly from/to a file inside of the FastScore enginecontainer. An example of such a stream is as follows:

file-stream-in.json {  “Description”: “read input from the specifiedfile”,  “Loop”: false,  “Transport”: {   “Type”: “file”,   “Path”:“/root/data/neural_net_input.jsons”  },  “Envelope”: “delimited”, “Encoding”: “json”,  “Schema”: {“type”: “array”, “items”: “double”} }

This file stream expects each line of the neural_net_input.jsons file tobe a vector of doubles, encoded as a JSON object, and delimitated bynewlines. The file is located in the /root/data/ directory of the enginecontainer. The “Loop”: false line indicates to FastScore to stop readingthe file after reaching the end of the file, as opposed to looping overthe lines in the file.

Streams Via CLI.

The FastScore CLI (288) may be used to configure data streams. Thestream list command displays a list of existing streams:

Shell $ fastscore stream list demo-1 demo-2

By default, two demo file streams may be included in FastScore. Thedemo-1 data set consists of random numbers. The demo-2 dataset consistsof lists of JSONS with the following AVRO schema:

JSON {  “type”: “array”,  “items”: {   “type”: “record”,   “fields”: [   {“name”:“x”, “type”:“double”},    {“name”:“y”, “type”:“string”}]  } }

These demo streams may be used to test whether or not a simple model isworking correctly. Additional streams may be added using the fastscorestream add <stream-name> <stream-descriptor-file> command. Existingstreams may be sampled (displaying the most recent items of the stream)with fastscore stream sample <stream-name>.

For filestreams, it is efficient to manage container input and output bylinking a directory on the host machine to the engine container. Thismay be done in the Docker-Compose file by modifying the engine serviceto the following:

YAML [...]  engine-1:   image: fastscore/engine-x:1.4   network_mode:“host”   stdin_open: true   tty: true   environment:     CONNECT_PREFIX:https://127.0.0.1:8001   volumes:             # new volume section    -./data:/root/data [...]

This links the ./data directory on the host machine to the /root/datadirectory of the engine container. A filestream from the file“mydata.jsons” located in data on the host machine may then be accessedby FastScore using the stream descriptor

JSON {  “Loop”: false,  “Transport”: {   “Type”: “file”,   “Path”:“/root/data/mydata.jsons”  },  “Envelope”: “delimited”,  “Encoding”:“json”,  “Schema”: [...] }

A similar stream descriptor can be used for the output stream to writethe output scores to a file in the same directory. Note that when usingDocker volume linking to link a directory on the host machine to theEngine instance, Docker generally should have privileges to read andwrite from the specified directory. Additionally, the directory on thecontainer should be chosen carefully, as its contents will beoverwritten with the contents of the corresponding host directory uponlinking. /root/data is safe (as it only contains the demo datafiles),but other directories on the container (e.g., /usr) may not be.

Streams Via the Dashboard.

Analogously to models, streams may also be manipulated from theDashboard, for example, by selecting a “Streams” widget under a ModelManager (210)

Engine Parameters.

Engine parameters, such as the number of Engine instances currentlyrunning, as well as information about the model, may be displayed on theDashboard (290) via an Engine (222) UI.

Running a Model in FastScore.

When using Dashboard (290), models may begin scoring as soon as both themodel and input/output streams are set from the Engine (222) UI, and nofurther action from the user is necessarily required. Various statisticsabout performance and memory usage may be displayed on the Engine tab.

To run a model using the FastScore CLI, the fastscore job sequence ofcommands may be used:

fastscore job run <model-name><input-stream-name><output-stream-name>runs the model named <model-name> with the specified input and outputstreams.

fastscore job stop halts the currently running model.

fastscore job status and fastscore job statistics display variousinformation about the currently running job.

Some of the statistics displayed by the fastscore job statisticscommand, such as resource usage, CPU usage and/or memory usage, are alsoshown on the Dashboard (290) based at least in part on sensors (272).

Import Policy. Python and R models often make use of libraries,sometimes importing third-party libraries. For example, a Python modelmay contain the statement

import scikit-learn

When deploying models in production, it may be valuable to control whichlibraries are installed and usable by a model. The exemplar FastScore'sEngine (222) provides this functionality through import policies.

An import policy manifest describes what the engine should do when amodel references certain libraries. The manifest may be a file, forexample a YAML-encoded file, with library names for keys. For example,the entry socket: prohibit instructs the engine not to load the modelthat references the socket library.

The possible entries are:

Entry Description <my-lib>: prohibit Do not load the model and issue anerror. <my-lib>: warn Allow the model to run, but issue a warning.<my-lib>: install Install the library using the default command.<my-lib>: Install the library using a custom command.  policy: install command: <command>

The engine (222) may know the standard install commands for modelrunners (228). For example, for Python, the engine may automatically usepip install <my-lib>.

An example import policy manifest would for a Python runner (228) is:

YAML os: prohibit socket: warn scikit-learn:  policy: install  command:pip install scikit-learn=3.2.1 nose: install

A (228) model runner's import policy manifest may be loaded from theimport.policy file located in the appropriate directory in the engine'sfilesystem:

For Python2: /root/engine/lib/engine-fs/priv/runners/python

For Python3: /root/engine/lib/engine-fs/priv/runners/python3

For R: /root/engine/lib/engine-fs/priv/runners/R

In one example, the import policy for a model runner is fixed as soon asa model is loaded into the engine, so any changes to import policiesmust be made before running a model. To copy a new manifest into thecontainer, the docker cp command or an equivalent may be used.

Record Sets and Control Records.

To better handle different model types such as batch models, theexemplar FastScore architecture and its models support using recordsets. A record set may be defined as an ordered collection of records.In R and Python, record sets are analogous to data frames and may bedeserialized into data frames.

Models with Record Sets.

To configure an R or Python model to use record sets in its inputs oroutputs, the # fastscore.recordsets: input or # fastscore.recordsets:output smart comments may be added to the model, respectively. To userecord sets in both the input and output streams, the #fastscore.recordsets: both smart comment may be used. No changes to themodel's input or output schema are required to use record sets.

Output Conventions.

There is some ambiguity involved when encoding a record set to an Avrotype. To resolve this, the example FastScore architecture uses thefollowing mapping conventions/design rules to determine how to encodeeach element in the output record set:

In Python:

-   -   If each output datum should be an Avro record, the model should        yield a Pandas DataFrame;    -   If each output datum should be an Avro array, the model should        yield a Numpy matrix; and/or    -   If each output datum should be an atomic Avro type (such as a        double or string), the model should yield a Pandas Series.

In R:

-   -   If each output datum should be an Avro record, the model should        yield a data.frame object;    -   If each output datum should be an Avro array, the model should        yield a matrix object; and/or    -   Atomic Avro types may not be supported for R record set output.

Examples:

The following model uses record sets as inputs, and returns a singlerecord as output.

Python # fastscore.recordsets: input # fastscore.input: two_doubles #fastscore.output: summary_record def action(record_set):  sum1 =sum(record_set[0])  sum2 = sum(record_set[1])  yield {“sum1”:sum1,“sum2”:sum2}

Note that the variable record_set may be deserialized as a PandasDataFrame. In this case, the input schema is

{“type”:“array”, “items”:“double”}and the output schema is

{  “type”:“record”,  “name”:“summary”,  “fields”: [   {“name”:“sum1”,“type”:“double”},   {“name”:“sum2”, “type”:“double”}  ] }

The next model uses record sets for both inputs and outputs.

Python # fastscore.recordsets: both # fastscore.input: named_doubles #fastscore.output: named_doubles_with_sum def action(record_set):  mydf =record_set  mydf[‘sum’] = mydf[‘x’] + mydf[‘y’]  yield mydf

Here, the input schema is:

{  “type”:“record”,  “name”:“input”,  “fields”:[   {“name”:“x”,“type”:“double”},   {“name”:“y”, “type”:“double”}  ] }and the output schema is

{  “type”:“record”,  “name”:“output”,  “fields”:[   {“name”:“x”,“type”:“double”},   {“name”:“y”, “type”:“double”},   {“name”:“sum”,“type”:“double”}  ] }

Streams and Control Records.

To use record sets, input and output streams may be explicitlyconfigured to do so by adding the “Batching”:“explicit” flag. Forexample, a valid input stream descriptor for the second example abovemight be:

JSON {  “Loop”: false,  “Transport”: {   “Type”: “file”,   “Path”:“/root/data/input.jsons”  },  “Batching”: “explicit”,  “Envelope”:“delimited”,  “Encoding”: “json”,  “Schema”: {“$ref”:“named_doubles”} }

Additionally, to use record sets, control records may be injected intothe data stream to mark the boundaries of a record set. A control recordis a special type of record in the data stream that does not containinput/output data, but instead requests an action to be performed on thestream.

There are at least three types of control records supported inFastScore:

-   -   1. end. The “end” control record marks the end of the input        stream. The underlying stream transport may contain more data,        but this remaining data is ignored. The behavior mimics what        happens when a stream receives an EOF signal from its transport;    -   2. set. The “set” control record marks the end of a record set,        and is how a record set-oriented stream is created, as described        above;    -   3. pig. A “pig” control record travels the whole length of the        FastScore pipeline. If a “pig” is injected into the input        stream, it may appear in the output stream. The purpose of a        “pig” is to provide dependency guarantees similar to a memory        barrier in a modern CPU—no input records after the “pig” can        affect the output records received before the “pig” in the        output stream.

Each control record may declare some common properties:

Name Type Description id int (4) A control record identifier timestamplong (8) A number of milliseconds since the unix epoch. Corresponds tothe AVRO timestamp logical type (millisecond precision). misc stringASCII characters only.

Control Records may have representations in each of the supportedencodings, as described in the following table. This table uses Python 2literals.

Encoding end set pig null \xfastscore.end \xfastscore.set\xfastscore.pig NOTE for null: The record size should be at least 12bytes. If there are at least 12 more bytes after the 12-byte prefix,then it may contain the ID and timestamp encoded using the ‘!Q’ Pythonstruct format. Any data that follow is the value of the misc property.utf-8 \u262efastscore.end \u262efastscore.set \u262efastscore.pig NOTEfor utf-8: ID, timestamp, and misc values may be appended separated bypipes. For example, ‘\u262efastscore.pig|1234|3476304987|misc-data’.json {“$fastscore”:“end”} {“$fastscore”:“set”} {“$fastscore”:“pig”} NOTEfor json: ID, timestamp, and misc values may be added as properties.

A data stream using JSON encoding for the second model example abovemight look like the following:

{“x”:3.0, “y”:2.0} {“x”:2.5, “y”:2.5} {“x”:−3.2, “y”:−1.0}{“$fastscore”:“set”}

The corresponding output stream would be:

{“x”:3.0, “y”:2.0, “sum”: 5.0} {“x”:2.5, “y”:2.5, “sum”: 5.0} {“x”:−3.2,“y”:−1.0, “sum”: −4.2} {“$fastscore”:“set”}

State Sharing and Snapshotting.

The exemplar FastScore architecture may support state sharing andsnapshotting in models, for example Python and R models. This may beachieved by providing a cell and pool interface to the models. Cells andpools provide persistent storage.

The distinction between a cell and pool is that cells are globalvariables that are shared across all runners, whereas pools are likeenvironments in R: collections of key-value pairs that are shared acrossall runners, and may be manipulated at runtime. Additionally, the stateof the engine's cells and pools may be “snapshotted”: saved and exportedfor later use.

State Management.

The cell-and-pool system may allow multiple instances of a model runner(228) to share data. Below are examples of models that may be runconcurrently. To change the number of instances running a model, thefastscore job scale CLI (288) command may be used, the /job/scale API(286) call may be used, or the Dashboard (290) may be used.

For example, the R programming language does not support concurrencywell. The system disclosed may automatically scale, based on input fromsensors (272), a plurality of R-based model engines (222) to parallelizetasks if for example a sensor (272) indicates that the CPU resources arenot being utilized efficiently. A state sharing and/or a statesnapshotting to be saved externally make such scaling possible and/orefficient.

Both of the sample models below use state management in similar ways. Akey difference is that the cell model updates a global variable named‘counter’, whereas the pool model updates the ‘x’ key inside of the‘counter’ pool. The Python model cell example:

Cell Model. Python Cell Example. # fastscore.input: int #fastscore.output: int import fastscore def begin( ):  fastscore.cells(‘counter’).set(0) def action(x):   counter =fastscore.cells(‘counter’)   counter.update(lambda y: x + y)   yieldcounter.value

The R model cell example:

Cell Model. R Cell Example. # fastscore.input: int # fastscore.output:int library(fastscore) begin <- function( ) {   counter <-Cell(‘counter’)   Set(counter, 0) } action <- function(x) {   counter <-Cell(‘counter’)   Update(counter, function(y) y + x)  emit(Get(counter)) }

For a given input, this example model returns the sum of the totalnumber of inputs and the value of the input. So, for example, theexpected output of the inputs 1, 2, 3 is 1, 3, 6.

The Python Model Pool Example:

Pool Model. Python Pool Example. # fastscore.input: int #fastscore.output: int import fastscore def begin( ):  fastscore.pools(‘counter’).set(‘x’, 0) def action(v):   counter =fastscore.pools(‘counter’)   counter.update(‘x’, lambda x: x + 1)  yield counter.get(‘x’)

The R Model Pool Example:

Pool Model. R Pool Example. # fastscore.input: int # fastscore.output:int library(fastscore) begin <- function( ){   counter <-Pool(‘counter’)   Set(counter, ‘x’, 0) } action <- function(datum){  counter <- Pool(‘counter’)   Update(counter, ‘x’, function(x) x + 1)  emit(Get(counter, ‘x’)) }

For every input, this example pool model returns the total number ofinputs received. So, for example, the expected output of the inputs 5,5, 5 is 1, 2, 3.

Snapshotting.

Snapshotting is a mechanism for capturing the state of an (222) engine'scells and pools. Model snapshots are automatically created when a modelreceives an end-of-stream message. To support snapshotting, the exemplarFastScore CLI (288) provides convenient wrappers around the snapshotRESTful API. These commands include:

fastscore snapshot list <model name> fastscore snapshot restore <modelname> <snapshot id>

Examples of snapshot commands include the snapshot list command showsthe saved snapshots for a given model. The snapshot restore commandrestores the specified snapshot for a particular model. Snapshots may beautomatically created upon receipt of an end-of-stream message, butthese end-of-stream messages may be introduced as control records intothe data stream for streaming transports, for example Kafka. To enablesnapshots, a fastscore.snapshots smart comment may be used:

# fastscore.snapshots: eof

An example of a Python model that creates snapshots on end of stream is:

Python snapshots # fastscore.input: int # fastscore.output: int #fastscore.snapshots: eof import fastscore def begin( ):   cell =fastscore.cells(‘sum’)   if cell.value == None:     cell.set(0) defaction(datum): cell = fastscore.cells(‘sum’) cell.update(lambda x: x +datum) yield cell.value

An example of an R model that creates snapshots on end of stream is:.

R snapshots # fastscore.input: int # fastscore.output: int #fastscore.snapshots: eof library(fastscore) begin <- function( ){   cell<- Cell(‘sum’)   if(length(Get(cell)) == 0){     Set(cell, 0)   } }action <- function(datum){   c_sum <- Cell(‘sum’)   result <- datum +Get(c_sum)   Set(c_sum, result)   emit (result) }

Example of Sensors.

Continuing the example for FastScore, an example of a sensor (272) is aconfigurable function that:

-   -   May be turned on and off, such as installed/uninstalled;    -   May be associated with a particular spot on the code path, such        as a “tapping point”;    -   May activate according to a schedule;    -   May filter/aggregate measurements;    -   May publish results separately from the output stream;    -   May have a language-agnostic descriptor; and/or    -   May output cost-related information.

Examples of potential uses for a sensor (272) includes: record/bytecounters at the edge of an input stream; CPU utilization measurementsfor a main process/loop; and/or a memory usage gauge for a model runner(228). A sensor (272) may be added to a FastScore microservice bydefault, for example the memory usage monitor present in a Dashboard(290).

Sensor Descriptors.

A sensor descriptor may be conceptually similar to a stream descriptor:it has a name, and is stored in Model Manager (210). A template for asensor descriptor is:

JSON {  “Tap”: “sys.memory”,  “Activate”: {   “Type”: “regular”,  “Interval”: 0.5  },  “Report”: {   “Interval”: 3.0  },  “Filter”: {  “Type”:“>=”,   “Threshold”:1G  } }

This example of a sensor (272) reports system memory usage, but only ifit exceeds 1 gigabyte. Examples of fields for configuring sensors (272)and their corresponding features include the following:

Field Explanation Type Tap The tapping point for the string sensor.Currently, this may be one of at least “sys .memory”, and“sys.cpu.utilization”. Activate A field to describe when to objectcollect sensor readings. In the above example, the sensor activatesevery 0.5 seconds. Activate.Type Allowed values: string “permanent”,“regular”, “random”. Activate.Intensity The number of activation floatevents per second. (The Interval element should be omitted.)Activate.Interval The time between activation float events. (TheIntensity element should be omitted.) Activate.Duration The time to keepthe tapping float point active. Activate.MaxReads Deactivate afterreceiving this int many reads. Report A field to describe when to objectreport sensor readings. In the above example, the sensor reports every 3seconds. Report.Interval How often to report sensor float readings.Filter The filtering function of the object sensor. May default to null.Filter.Type Allowed values include: string “>”, “>=” “<”, “<=”,“within-range”, “outside-range”, “datatype”, “mean”, “median”, “mode”,“standard_deviation”, “variance”, “pdf”, “cdf” Filter.Threshold Thethreshold value for less- float than or greater-than comparisons.Filter.MinValue The minimum value for range float filters.Filter.MaxValue The maximum value for range float filters Aggregate Theaggregation function of object the sensor. Accepts “accumulate”, “sum”,and “count” as shortcuts. May default to “accumulate”. Aggregate.TypeOne of “accumulate”, “sum”, string or “count”. Aggregate.SampleSize Themaximum number of int values to accumulate

Note that filter values such as Threshold, MinValue, and MaxValue mayaccept human-friendly values, for example “1G” as well as 1073741824.

An Example.

To add the sensor example above to FastScore, the CLI (288) may be used:

fastscore sensor add s1 <<EOF {  “Tap”: “sys.memory”,  “Activate”: {  “Type”: “regular”,   “Interval”: 0.5  },  “Report”: {   “Interval”:3.0  } } EOF

After entering this command, the CLI (288) may return Sensor ‘s1’ addedif the command was successful.

In an example implementation, all sensors (272) may be installed onModel Manager (210). Installing the sensor may include the command:

$ fastscore tap install model-manage-1 s1Sensor ‘s1’ installed [2]

The number in the square brackets is the identifier of the sensordeployment. The identifier may be needed to stop the sensor later. Itmay also be found from the fasts core tap list command:

$ fastscore tap list model-manage-1  Id Tap Active ---- ------------------  2 sys.memory No

The example sensor may activate periodically, that is 2 times a second,and collect the memory consumed by the service. The collected data maybe reported as Pneumo messages, such as Kafka messages on the topic“notify”, every 3 seconds. These may be viewed in the CLI with thefastscore pneumo command:

$ fastscore pneumo [model-manage-1] 16:10:44 sys.memory [2] [2123345920,2123345920, 2123345920, 2123345920, 2123345920, 2123345920][model-manage-1] 16:10:47 sys.memory [2] [2123345920, 2123345920,2123345920, 2123345920, 2123345920, 2123345920] [model-manage-1]16:10:50 sys.memory [2] [2123345920, 2123345920, 2123345920, 2123345920,2123345920, 2123345920]

The sensor may be uninstalled using the fastscore tap uninstall command:

$ fastscore tap uninstall model-manage-1 2Sensor [2] uninstalled

Once uninstalled, these reports may no longer be sent through Pneumo.

Example of I/O Descriptor Abstraction.

An I/O descriptor abstraction (232) may be a JSON document that includesall of the information about a stream. In general, an input stream readsmessages from an underlying transport, optionally verifies, and feedsthem to models. The output stream may act similarly, but in the reverseorder. I/O descriptor abstractions are required for the engine to readinput and produce output, and may additionally be used to enforceinput/output typing, and/or data science constraints using AVRO schema.

In one embodiment, by convention, all field names of an I/O descriptorabstraction (232) start with a capital letter and do not usepunctuation. Many fields of an I/O descriptor abstraction (232) that mayhave default values that depend on values of other fields. If a field isomitted, it may be set to a default value. Sometimes, a user may set afield to null to avoid this default behavior. For example, if omitted,an EndMarker is set to “$end-of-stream” for certain streams. To disablethe ‘soft’ end-of-file behavior based on the EndMarker the field may beset to null. Some fields may accept shortcut values. For instance,Transport field may be set to a “discard” string instead of theequivalent yet more verbose ‘{“Type”: “discard”}’ object.

Field Descriptions.

A template for an I/O descriptor abstraction (232) is below. Note thatthe type of transport used may determine which fields in the Transportsection are needed. Additionally, the top-level fields Loop, EndMarker,SkipTo, and skipToRecord may have default values depending on the choiceof transport.

Stream Descriptor Template {  “Version”: “1.2”,  “Description”: “MySample I/O descriptor abstraction”,  “Transport”: {   “Type”: “REST” |“HTTP” | “kafka” | “file” | “TCP” | “UDP” | “debug” | “console” |“discard”,   “Url” : “http://www.mywebsite.com”, // HTTP only  “BootstrapServers”: [“127.0.0.1:9181”, “192.168.1.5:9003”], // Kafka  only   “Topic”: “My Kafka Topic”, // Kafka only   “Partition” : 1, //Kafka only, defaults to 0   “MaxWaitTime”: 50000, // Kafka only,defaults to 8388607   “Path”: “/path/to/file.ext”, // file only  “Host”: “127.0.0.1”, // TCP only   “Port”: 8182, // TCP and UDP only  “BindTo”: “127.0.0.1”, // UDP only, defaults to 0.0.0.0   “Data” : [“abc”, “def”, “ghi”], // debug only   “DataBinary”: [“AQIDBQ==”], //debug only, use one of Data   or DataBinary  },  “Loop”: false, “EndMarker”: “$end-of-stream”,  “SkipTo”: null,  “SkipToRecord”:“latest”,  “Envelope”: “deliminated”,  “Encoding”: null | “json” |“avro-binary”,  “Schema”: { ... } // AVRO schema for stream }

Once an I/O descriptor abstraction (232) has been constructed, it may bevalidated against the following AVRO schema. Some modification of thisschema may be required dependent on the choice of default values.

Stream Descriptor Schema {  “type”: “record”,  “fields”: [   {“name”:“Version”, “type”: “string”},   {“name”: “Description”, “type”:“string”,“default”: “”},   {“name”: “Transport”, “type”:    [     {“type”:“record”, “fields”: [      {“name”:“Type”, “type”:“string”}]}, // REST    {“type”: “record”, “fields”: [      {“name”: “Type”, “type”:“string”}, // HTTP      {“name”: “Url”, “type”: “string”}]},    {“type”: “record”, “fields”: [      {“name”: “Type”, “type”:“string”}, // Kafka      {“name”: “BootstrapServers”, “type”:{“type”:“array”, “items”:“string”} },      {“name”: “Topic”, “type”: “string”},     {“name”: “Partition”, “type”: “int”, “default”: 0},      {“name”:“MaxWaitTime”, “type”: “int”, “default”:      8388607}]},     {“type”:“record”, “fields”: [      {“name”: “Type”, “type”: “string”}, // File     {“name”: “Path”, “type”: “string”}]},     {“type”: “record”,“fields”: [      {“name”: “Type”, “type”: “string”}, // TCP     {“name”: “Host”, “type”: “string”},      {“name”: “Port”, “type”:“int”}]},     {“type”: “record”, “fields”: [      {“name”: “Type”,“type”: “string”}, // UDP      {“name”: “BindTo”, “type”: “string”,“default”: “0.0.0.0”},      {“name”: “Port”, “type”: “int”}]},    {“type”: “record”, “fields”: [      {“name”: “Type”, “type”:“string”}, // debug      {“name”: “Data”, “type”: [“string”, {“type”:“array”, “items”: “string”}]}, // only one of Data or DataBinary isrequired      {“name”: “DataBinary”, “type”: “string”}]},     {“type”:“record”, “fields”: [{“name”:“Type”, “type”:     “string”}]},    “string” // discard or console    ]},   {“name”: “Loop”,“type”:“boolean”, “default”: false}, // default depends on transport  {“name”: “EndMarker”, “type”: [“null”, “string”], “default”:“$end-of-stream”}, // default depends on transport   {“name”: “SkipTo”, “type”:[“null”, “int”], “default”: null},   {“name”: “SkipToRecord”, “type”:[“null”, “int”, “string”], “default”:“latest”}, // default depends ontransport   {“name”: “Envelope”, “type”:   [“string”,    {“type”:“record”, “fields”: [ // deliminated     {“name”: “Type”, “type”:“string”},     {“name”: “Seperator”, “type”: “string”,“default”:“\n”}]},    {“type”: “record”, “fields”: [ // ocf-block    {“name”: “Type”, “type”: “string”},     {“name”: “SyncMarker”,“type”: “string”}]}   ]},   {“name”: “Encoding”, “type”: [“null”,“string”]},   {“name”: “Schema”, “type”: [“string”, “object”]}   ] }

Common Fields

The following table describes common fields used in I/O descriptorabstractions (232). Fields in italics may be optional.

Field Type Description Default Value Example Version string The versionof “1.2” “1.2” the stream descriptor. Description string A descriptionfor (empty) “An input file this stream stream.” (optional). Transportstring or Specifies the object details of the Transport for this stream(see below). Loop boolean Set to true to true for true read the streamfilestreams, in a loop. false otherwise EndMarker string An end-of- nullfor AVRO “LastMessage” stream marker to binary streams, indicate thatthe $end-of- last message in stream for all the stream has others beenreceived. SkipTo int or null Skip to the byte null 5 offset whenstarting to read the stream. SkipToRecord int, string, Skip to record by“latest” for “latest” or null number or Kafka streams, keyword. nullotherwise Envelope “deliminated” Specifies the “deliminated” or“deliminated” or framing of the null “ocf-block” messages in the orstream (see object below). Encoding null, Specifies the “avro-binary”,or encoding of the “json” messages in the stream. Schema null, string,AVRO schema null “int” or object for records in this stream.

The Schema field may specify schemas by reference (as well as explicitlydefine them). A schema reference takes the following example form:

“Schema”: {“$ref”:“schema_name”}where schema_name is the name of the schema in Model Manager (210).

Transport Fields.

There are various possible fields in Transport descriptors. As before,fields in italics are optional.

REST.

The REST stream transport does not include any additional transportfields.

HTTP.

HTTP streams contain at least one field—the URL to the data source.

Field Type Description Default Example Url string The URL of (none)“http://www.path.to/file.extension” the data.

Kafka.

Kafka stream transports have several possible fields:

Field Type Description Default Example BootstrapServers array of A listof the Kafka (none) [“192.168.1.5:9002”, string bootstrap servers.“127.0.0.1:9003”] Topic string The Kafka topic. (none) MyKafkaTopicPartition int The Kafka partition.    0  5 MaxWaitTime int The maximumtime 8388607 500 to wait before (approx. 25 declaring that the days) endof the stream has been reached.

File.

File streams only have one parameter: the path to the file. Note thatthe path to the file is relative to the Engine (222) container'sfilesystem, not the filesystem of the machine hosting the Engine.

Field Type Description Default Example Path string The path to the(none) “/path/to/file” file.

UDP.

UDP Transports may be described using at least two fields.

Field Type Description Default Example BindTo string The IP address to“0.0.0.0” “127.0.0.1” bind to. Port int The port to listen (none) 8000to.

TCP.

TCP transports require at least a mandatory specification for both ahost and a port.

Field Type Description Default Example Host string The IP address of(none) “127.0.0.1” the host machine. Port int The port of the (none)8765 host machine.

Debug.

A debug transport type may allow the user to embed a batch of records tobe scored directly into an input stream descriptor (232). As the nameimplies, it is intended primarily for model and stream debugging.

Field Type Description Default Example Data a string or A single record,(none) [“\“json string\””] array of or an array of string JSON recordsto be scored. DataBinary a string or Either a base64- (none) “AQIDBQ==”array of encoded binary string datum or an array of base64- encodedmessages.

Console and Discard.

The console and discard transports have no fields. The discard transportdiscards all content—as such, it only makes sense for output streamswhere a user does not care about the output of the engine.

Console streams are subtle: output is relayed back to the FastScore CLI(288). In order for this to work, however, the CLI should be in“interactive” mode (i.e., started with the fastscore command), andFastScore may be configured to use Pneumo, a library that enablesasynchronous notifications over Kafka.

Transport-Specific Examples.

Examples of I/O descriptor abstraction (232) for various combinations oftransports, encodings, and envelopes are given below.

REST Stream Examples.

The REST transport allows inputs to be delivered to the engine with the/1/job/input/ POST command. If the output stream is also set to REST,the /1/job/output GET command can be used to retrieve the resultingscores.

JSON  {  “Transport”: {   “Type”: “REST”  },  “Envelope”: “delimited”, “Encoding”: “JSON”,  “Schema”: null }

Debug Stream Examples.

Below is an example of a debug stream, where the messages are allinline, and separated by newlines.

Debug Inline Stream  {   “Version”: “1.2”,   “Description”: “read aninline sequence of 3 messages separated by newlines”,   “Transport”: {   “Type”: “debug”,    “Data”: “aaa\nbbb\nccc”   },  “Envelope”:“delimited”,  “Encoding”: null,  “Schema”: null }

Below is an example of a debug stream using a list of binary inputs.

Debug Binary Stream  {  “Version”: “1.2”,  “Description”: “read aninline sequence of 3 binary messages”,  “Transport”: {   “Type”:“debug”,   “DataBinary”: [“uKs/srYgWfY=”,          “kiqGJppq2Z4=”,         “VBPsuSTfUiM=”]  },  “Envelope”: “delimited”,  “Encoding”:null,  “Schema”: null }

Http Examples.

The following is an example of an HTTP stream.

HTTP Example  {   “Version”: “1.2”,   “Description”: “read a sequence ofopaque (unicode) strings separated by newlines over HTTP transport”,  “Transport”: {    “Type”: “HTTP”,    “Url”:“https://s3-us-west-1.amazonaws.com/fastscore-sample-data/prime.test.stream”   },   “Envelope”: {    “Type”: “delimited”,   “Separator”: “\r\n”   },   “Encoding”: null,   “Schema”: null }

Kafka Examples.

The following example is a stream descriptor for a Kafka input stream.

Kafka Input Example  {  “Version”: “1.2”,  “Description”: “read asequence of opaque (unicode) strings over Kafka transport”, “Transport”: {   “Type”: “kafka”,   “BootstrapServers”:[“127.0.0.1:9092”],   “Topic”: “data-feed-1”,   “Partition”: 0  }, “Envelope”: null,  “Encoding”: null,  “Schema”: null }

This example writes a sequence of AVRO-binary typed data to a Kafkastream.

Kafka AVRO-binary  {  “Version”: “1.2”,  “Description”: “write asequence of binary-encoded Avro documents to Kafka”,  “Transport”: {  “Type”: “kafka”,   “BootstrapServers”: [“127.0.0.1:9092”],   “Topic”:“data-feed-1”,   “Partition”: 0  },  “Envelope”: null,  “Encoding”:“avro-binary”,  “Schema”: { type: “record”, ... } }

File Stream Examples.

The following is an example of a file stream input, expecting each lineof the file to contain an integer. An analogous stream descriptor can beused for a file output stream. Note that /root/data/input.jsons refersto the path to input.jsons inside of the engine container (222), not onthe host machine.

File Input Stream  {  “Version”: “1.2”,  “Loop”: false,  “Transport”: {  “Type”: “file”,   “Path”: “/root/data/input.jsons”  },  “Envelope”:“delimited”,  “Encoding”: “json”,  “Schema”: “int” }

TCP Examples.

Here's an example TCP stream descriptor.

TCP Example  {  “Version”: “1.2”,  “Description”: “read a sequence ofuntyped json separated by newlines over TCP transport”,  “Transport”: {  “Type”: “TCP”,   “Host”: “127.0.0.1”,   “Port”: 12012  },  “Envelope”:“delimited”,  “Encoding”: “json”,  “Schema”: null }

UDP Examples.

The following stream descriptor describes a UDP input stream.

UDP Example  {  “Version”: “1.2”,  “Description”: “read a sequence ofuntyped json documents over UDP transport”,  “Transport”: {   “Type”:“UDP”,   “Bind”: “0.0.0.0”,   “Port”: 53053  },  “Envelope”: null, “Encoding”: “json”,  “Schema”: null }

Example Schema Reference.

To continue the above example of an analytic engine system, FastScoreenforces strict typing of engine inputs and outputs at two levels:stream input/output (232), and model input/output (224). Types may bedeclared using AVRO schema.

To support this functionality, FastScore's Model Manager (210) maintainsa database of named AVRO schemas. Python and R models may then referencetheir input and output schemas using smart comments. PrettyPFA and PFAmodels may instead explicitly include their AVRO types as part of themodel format. Stream descriptors may either reference a named schemafrom Model Manager (210), or they may explicitly declare schemas.Throughout this specification, a check may include a datatype check, forexample int vs float, or a data science check, for example ensuring themean or standard deviation of a stream is within tolerances.

In either case, FastScore performs the following type checks:

-   -   1. Before starting a job: the input stream's schema is checked        for compatibility against the model's input schema, and the        output stream's schema is checked for compatibility against the        model's output schema;    -   2. When incoming data is received: the incoming data is checked        against the input schemas of the stream and model; and/or    -   3. When output is produced by the model: the outcoming data is        checked against the model and stream's output schemas.

Failures of any of these checks are reported: schema incompatibilitiesbetween the model and the input or output streams may produce an error,and the engine (222) will not run the job. Input or output records thatare rejected due to schema incompatibility appear as Pneumo messages,and a report of rejected records is also shown in Dashboard's Enginepanel.

Examples.

The following model takes in a record with three fields (name, x and y),and returns the product of the two numbers.

model.py # fastscore.input: named-array # fastscore.output: named-doubledef action(datum):   my_name = datum[‘name’]   x = datum[‘x’]   y =datum[‘y’]   yield {‘name’: my_name, ‘product’:x*y}

The corresponding input and output AVRO schema are:

named-array.avsc  {  “type”: “record”,  “name”: “input”,  “fields”: [  {“name”:“name”, “type”:“string”},   {“name”:“x”, “type”:“double”},  {“name”:“y”, “type”:“double”}   ] } named-double.avsc  { “type”:“record”,  “name”:“output”,  “fields”: [   {“name”:“name”,“type”:“string”},   {“name”:“product”, “type”:“double”}   ] }

So, for example, this model may take as input the JSON record

{“name”:“Bob”, “x”:4.0, “y”:1.5}

and score this record to produce

{“name”:“Bob”, “product”:“6.0”}

Once FastScore is running, the model and associated schemas may be addedto a model manager (210) with the following commands:

fastscore schema add named-array named-array.avscfastscore schema add named-double named-double.avscfastscore model add my_model model.py

Assuming that the input and output descriptor abstractions have beenconfigured to use these schemas, the job may be run with:

fastscore job run my_model <input stream name><output stream name>

The stream descriptors may be set to use these schemas with the Schemafield. For example, for the input stream descriptor:

“Schema”:{“$ref”:“named-array”}

Note that in both the model's smart comments, the CLI (288) commands,and the stream descriptor schema references, the schemas are referencedby their name in a model manager (210), not the filename or any otherproperty.

Example of Model Deploy.

As an example of a model deploy (293), for the above example of theFastScore architecture, Model Deploy (293) is a containerized Jupyternotebook server with FastScore's model deployment and Jupyterintegration toolkit built in. It may be built on top of the Jupyter datascience Docker image. Model Deploy provides model creation anddeployment tools for R, Python 2, and Python 3 notebooks, as well as forPFA.

Starting Model Deploy.

Starting Model Deploy (293) may include the following command:

docker run -it --rm -p 8888:8888 fastscore/model-deploy:latest

If other services in the FastScore fleet are also running on the samehost, it may be advantageous to start Model Deploy (293) with the--net=“host” option, so that these services are accessible fromlocalhost.

Model Deploy may also be started with any of the additionalconfiguration options available to the Jupyter base Docker image. Oncethe container (293) is created, it may be accessible from port 8888 bydefault on the host machine, using the token generated during thestartup process.

Model Deploy Functionality.

Model Deploy provides a number of features to make it easy to migrate amodel (252) into FastScore:

-   -   Python and R supply a Model class that can be used for        validation and testing of a model locally, before deploying to a        FastScore engine (222);    -   The Model.from_string function in Python and Model_from_string        function in R provide shortcuts for creating a Model object from        a string of code. In Python notebooks, the %%py2model,        %%py3model, %%pfamodel, and %%ppfamodel cell magic commands may        automatically convert the contents of a cell into a Python, PFA,        and PPFA model object, respectively;    -   The Engine class allows for direct interaction with a FastScore        Engine (222), including scoring data using a running Engine;    -   Model objects may be deployed directly to a FastScore Engine        (222) from within the Jupyter notebook, as well as to Model        Manager (210); and/or    -   A utility codec library is included to make it easier to        serialize R and Python objects to JSON and other formats based        on an Avro schema.

Example notebooks demonstrating this functionality may be included withthe Model Deploy container.

Example Tutorial for a Data Science Model.

Gradient Boosting Regressors (GBR) are ensemble decision tree regressormodels. In this example, a GBR model is prepared for use in the aboveexample of an analytic engine architecture, the FastScore architecture.A model is constructed to estimate the reliability of variousautomobiles.

The model is constructed in Python using SciKit Learn, and both inputand output data streams use Kafka. This example demonstrates severalfeatures of FastScore:

-   -   Running a trained Python model in FastScore;    -   Installing additional Python libraries in a FastScore engine;    -   Using custom classes with model attachments; and/or    -   Scoring records over Kafka streams

The following uses the Python libraries:

-   -   NumPy (numpy)    -   Pandas (pandas)    -   SciKit Learn (sklearn)    -   Kafka (kafka, if you're using the included Python Kafka client)

Each of these libraries may be installed using pip.

Overview of Gradient Boosting Regressors.

Gradient boosting regressors are a type of inductively generated treeensemble model. At each step, a new tree is trained against the negativegradient of the loss function, which is analogous to (or identical to,in the case of least-squares error) the residual error.

Training and Running a GBR Model in SciKit Learn

This section reviews how to train a GBR model using SciKit Learn inPython.

The Dataset and the Model.

A GBR model (252) is designed from a data science users to estimatereliability for various types of automobiles from various features ofthe vehicle. The scores produced are numbers between −3 and +3, wherelower scores indicate safer vehicles.

Transforming Features.

For best results from the GBR model (252), preprocessing of the inputdata is performed. To keep the model itself as simple as possible, thefeature preprocessing is separated from the actual scoring, andencapsulate it in its own module:

FeatureTransformer.py from itertools import chain import numpy as npimport pandas as pd from sklearn.base import BaseEstimator,TransformerMixin from sklearn.preprocessing import Imputer,StandardScaler from sklearn.pipeline import Pipeline # definetransformer to scale numeric variables # and one-hot encode categoricalones class FeatureTransformer(BaseEstimator, TransformerMixin):  def_init_(self, transforms = [(“impute”, Imputer( )), (“scale”,StandardScaler( ))]):   self.transforms = transforms  def fit(self, X, y= None):   self.columns_ = X.columns   self.cat_columns_ =X.select_dtypes(include = [“object”]).columns   self.non_cat_columns_ =X.columns.drop(self.cat_columns_)   self.pipe =Pipeline(self.transforms).fit(X.ix[:,self.non_cat_columns_])  self.cat_map_ = {col: X[col].astype(“category”).cat.categories          for col in self.cat_columns_}   self.ordered_ = {col:X[col].astype(“category”).cat.ordered           for col inself.cat_columns_}   self.dummy_columns_ = {col: [“_”.join([col, v])                for v in self.cat_map_[col]]              for col inself.cat_columns_}   self.transformed_columns_ = pd.Index(   self.non_cat_columns_.tolist( ) +   list(chain.from_iterable(self.dummy_columns_[k]                 for kin self.cat_columns_))   )   return self  def transform(self, X, y =None):   scaled_cols = pd.DataFrame(self.pipe.transform(X.ix[:,self.non_cat_columns_]),               columns =self.non_cat_columns_).reset_index(drop = True)   cat_cols =X.drop(self.non_cat_columns_.values, 1). reset_index(drop = True)  scaled_df = pd.concat([scaled_cols, cat_cols], axis = 1)  final_matrix = (pd.get_dummies(scaled_df)           .reindex(columns =self.transformed_columns_)           .fillna(0).as_matrix( ))   returnfinal_matrix

This is a utility class for imputing raw input records. A typical inputrecord may include:

JSON {  “engineLocation”: “front”,  “numDoors”: “four”,  “height”: 54.3, “stroke”: 3.4,  “peakRPM”: 5500,  “horsepower”: 102,  “bore”: 3.19, “fuelType”: “gas”,  “cityMPG”: 24,  “make”: “audi”,  “highwayMPG”: 30, “driveWheels”: “fwd”,  “width”: 66.2,  “curbWeight”: 2337, “fuelSystem”: “mpfi”,  “price”: 13950,  “wheelBase”: 99.8, “numCylinders”: “four”,  “bodyStyle”: “sedan”,  “engineSize”: 109, “aspiration”: “std”,  “length”: 176.6,  “compressionRatio”: 10.0, “engineType”: “ohc” }

Many of the features of this record, such as the manufacturer or bodystyle of the car, are categorical, and the numerical variables have notbeen normalized. Gradient boosting models may work best when all of theinput features have been normalized to have zero mean and unit variance.

The FeatureTransformer class performs these imputations using twofunctions. First, fit trains the FeatureTransformer using the trainingdata. This determines the mean and standard deviation of the trainingdata and rescales the numerical inputs accordingly, as well as convertsthe categorical entries into collections of dummy variables with one-hotencoding. Fitting the FeatureTransformer is done as part of modeltraining, as discussed below.

The transform function may be used during model scoring to performstreaming imputations of input records. The imputing is done using theinformation about the mean, variance, and categorical variablesdetermined from the fit function.

Training the Model.

SciKit Learn may be used to build and train the GBR model (252). First,the following libraries are imported:

Python import cPickle import numpy as np import pandas as pd fromsklearn.ensemble import GradientBoostingRegressor from sklearn.pipelineimport Pipeline from sklearn.model_selection import GridSearchCV fromsklearn.metrics import mean_squared_error, make_scorer fromFeatureTransformer import FeatureTransformer

cPickle is used to store the fitted FeatureTransformer, and numpy andpandas perform some manipulation of the input data. Finally, the sklearnlibraries are used to train the model. Building and training the modelis fairly standard:

Python # read in training data in_data = pd.read_json(“train_data.json”,orient = “records”) X = in_data.drop(“risk”, 1) y =np.array(in_data[“risk”]) # create feature transformation and trainingpipeline preprocess = FeatureTransformer( ) gbm =GradientBoostingRegressor(learning_rate = 0.1, random_state = 1234) pipe= Pipeline([(“preprocess”, preprocess), (“gbm”, gbm)]) # fit modelgbm_cv = GridSearchCV(pipe,         dict(gbm_n_estimators = [50, 100,150, 200],         gbm_max_depth = [5, 6, 7, 8, 9, 10]),       cv = 5,      scoring = make_scorer(mean_squared_error),       verbose = 100)gbm_cv.fit(X, y) # pickle model with open(“gbmFit.pkl”, “wb”) aspickle_file:   cPickle.dump(gbm_cv.best_estimator_, pickle_file)

Note that, because custom class FeatureTransformer is included as partof our data pipeline, the custom class file FeatureTransformer.py shouldbe included along with the actual pickled object gbmFit.pkl in anattachment.

Scoring New Records.

Once the GBR model is trained, scoring new data is simple:

Python import cPickle import json import numpy as np import pandas as pdfrom sklearn.ensemble import GradientBoostingRegressor fromsklearn.pipeline import Pipeline from FeatureTransformer importFeatureTransformer # load our trained model with open(‘gbmFit.pkl’,‘rb’) as pickle_file:  gbmFit = cPickle.load(pickle_file) # each inputrecord is delivered as a string def score(record):  datum =json.loads(record)  score =list(gbmFit.predict(pd.DataFrame([datum]).replace(“NA”, np.nan)))[0] return json.dumps(score)

This model may be adapted essentially without modification for runningin FastScore.

Loading the Model in FastScore.

Loading the GBR model (252) to FastScore may be broken into two steps:preparing the model code and creating the input and output streams.

Preparing the Model for FastScore.

In the previous section, a small Python script was created to scoreincoming auto records using the trained gradient boosting regressor anda custom feature transformer. In this example, the training of the modelhas already been done, so there is only a need to adapt the trainedmodel to produce scores.

As discussed above, design rules for Python models in FastScore includedelivering scores using an action method. Note that the action methodoperates as a generator, so scores are obtained from yield statements,rather than return statements. Additionally, because reloading thetrained model with every score is inefficient, a begin method is definedto do all of the model initialization. Design rules include that if amodel defines a begin method, this method will be called at the start ofthe job. After these alterations, the model abstraction binding (226)conforming to design rules for the analytic model abstraction (224) is:

score_auto_gbm.py # fastscore.input gbm_input # fastscore.outputgbm_output import cPickle # unpickle a file import imp # Load a customclass from the attachment import numpy as np import pandas as pd fromsklearn.ensemble import GradientBoostingRegressor from sklearn.pipelineimport Pipeline # GBM model def begin( ):   FeatureTransformer =imp.load_source(‘FeatureTransformer’, ‘FeatureTransformer.py’)   globalgbmFit   with open(“gbmFit.pkl”, “rb”) as pickle_file:     gbmFit =cPickle.load(pickle_file) def action(datum):   score =list(gbmFit.predict(pd.DataFrame([datum]).replace(“NA”, np.nan)))[0]  yield score

To review the design rule changes made between this script, which isready for scoring in FastScore, and the original one:

-   -   The input and output schemas have been specified in smart        comments at the beginning of the model;    -   The score method has been renamed to action, and all JSON        deserialization and serialization of the input and output        records is taken care of automatically by FastScore;    -   The logic to load the pickled gbmFit object and any other        initialization code is now put in a well-defined begin method,        to be executed when the job starts; and    -   With a custom class contained in the attachment, it is loaded        using Python's imp module, as opposed to from FeatureTransformer        import FeatureTransformer.

Input and Output Schemas.

FastScore may use AVRO schemas to enforce type and/or data sciencevalidation on model inputs and outputs. Both input/output streams, aswell as the models themselves, should specify schemas.

The input schema for data may support complexity if the input recordscontain many fields.

gbm_input.avsc {   “type”: “record”,   “name”: “CarRecord”,   “fields”:[    {“name”: “make”, “type”: “string”},    {“name”: “fuelType”, “type”:“string”},    {“name”: “aspiration”, “type”: “string”},    {“name”:“numDoors”, “type”: “string”},    {“name”: “bodyStyle”, “type”:“string”},    {“name”: “driveWheels”, “type”: “string”},    {“name”:“engineLocation”, “type”: “string”},    {“name”: “wheelBase”, “type”:“double”},    {“name”: “length”, “type”: “double”},    {“name”: “width”,“type”: “double”},    {“name”: “height”, “type”: “double”},    {“name”:“curbWeight”, “type”: “int”},    {“name”: “engineType”, “type”:“string”},    {“name”: “numCylinders”, “type”: “string”},    {“name”:“engineSize”, “type”: “int”},    {“name”: “fuelSystem”, “type”:“string”},    {“name”: “bore”, “type”: “double”},    {“name”: “stroke”,“type”: “double”},    {“name”: “compressionRatio”, “type”: “double”},   {“name”: “horsepower”, “type”: “int”},    {“name”: “peakRPM”, “type”:“int”},    {“name”: “cityMPG”, “type”: “int”},    {“name”: “highwayMPG”,“type”: “int”},    {“name”: “price”, “type”: “int”}  ] }

The output schema may be much simpler; the output of the model may justbe a double between −3 and 3.

gbm_output.avsc{“type”:“double”}

Input and Output Descriptor Abstraction.

Another example feature of FastScore is that it enforces strong typecontracts on model inputs and outputs: a model's inputs are guaranteedto match the specified input format, as are its outputs. The same mayalso be extended for data science constraints such as mean, standarddeviation, and probably density function. The input and output streamsare described using I/O descriptor abstractions. In this example, Kafkais used to both send and receive scores.

For the output stream, the I/O descriptor abstraction (232) may besimple:

gbm-out.json {  “Transport”: {   “Type”: “kafka”,   “BootstrapServers”:[“127.0.0.1:9092”],   “Topic”: “output”  },  “Envelope”: null, “Encoding”: “json”,  “Schema”: {“$ref”:“gbm_output”} }

This I/O descriptor specifies that scores may be delivered on the“output” Kafka topic using the Kafka bootstrap server located at127.0.0.01:9092, and that the scores delivered will be of AVRO typedouble, as specified in the output schema gbm_output.avsc.

The input stream descriptor includes the more complex schema,encapsulating the various features of the automobile input records. Thisschema is specified by reference, so that both the model abstraction(224) and the I/O descriptor abstraction (232) point to the same schema.In this way, if there are any changes to the schema, the model andstream descriptor will both use the new schema.

gbm-in.json {  “Transport”: {   “Type”: “kafka”,   “BootstrapServers”:[“127.0.0.1:9092”],   “Topic”: “input”  },  “Envelope”: null, “Encoding”: “json”,  “Schema”: { “$ref”:“gbm_input”} }

Starting and Configuring FastScore.

Starting up FastScore may be as easy as executing the following command:

docker-compose up -d

Once the FastScore containers are up and running, they may be configuredvia the CLI (288):

fastscore connect https://dashboard-host:8000fastscore config set config.ymlwhere dashboard-host is the IP address of the Dashboard container (ifyou're running the Dashboard container in host networking mode on yourlocal machine as in the Getting Started Guide, this will just belocalhost).

After configuration, the containers may be monitored to see if they arehealthy, for example in the CLI (288):

fastscore fleet Name API Health -------------- ------------ --------engine-x-1 engine-x ok model-manage-1 model-manage ok

A Note on Kafka:

The instructions above assume a currently configured and running Kafkaserver set up with topics for the input and output streams, as well asthe notify topic used by FastScore for asynchronous notifications. Inthe example, an additional docker-compose file, such askafka-compose.yml, may automatically start up Kafka docker containersconfigured for this example. The Kafka services from this docker-composefile may for example be started with

docker-compose -f kafka-compose.yml up -dbefore starting FastScore.

Adding Packages to FastScore.

The model code written uses the pandas and sklearn Python packages,which need to be add to the FastScore Engine (222) container. The codealso uses the numpy package, but this is installed in FastScore bydefault.

To add new packages to the engine container (222), there are two steps:

-   -   1) Install the package, for example, with pip; and    -   2) Add the package to the list of installed modules.

To install the packages needed, the commands pip install pandas and pipinstall sklearn should be executed in the engine container (222). Forexample, using docker-compose:

docker-compose exec engine-1 pip install pandasdocker-compose exec engine-1 pip install sklearn

Next, the novel packages that the model uses are added to FastScore'spython.modules list. This list is used to check whether or not thecurrent engine (222) possesses the required dependencies for a modelbefore attempting to run the model (252) in the model runner (228). Thepython.modules file is located inside of the engine container's filesystem at

/root/engine/lib/engine-fs/priv/runners/python/python.modules

To add the needed modules to the container via docker-compose, thefollowing commands may be executed:

docker-compose exec engine-1 bash -c ‘echo pandas >> /root/engine/lib/engine-fs/priv/runners/python/python.modules’ docker-compose execengine-1 bash -c ‘echo sklearn.ensemble >>/root/engine/lib/engine-fs/priv/runners/python/python.modules’docker-compose exec engine-1 bash -c ‘echo sklearn.pipeline >>/root/engine/lib/engine-fs/priv/runners/python/python.modules’

If the container may be reused later, changes may be saved such that thepackages do not need to be installed again in the future with the dockercommit command:

docker commit [name of engine container] [name for new engine image]After committing the new image, a docker-compose file may be updated touse the new image created.

Creating the Attachment.

In this section, it is assumed that the model file score_auto_gbm.py hasbeen created, as well as the input and output stream descriptorsgbm-in.json and gbm-out.json, and the pickled FeatureTransformergbmFit.pkl and FeatureTransformer module FeatureTransformer.py.

Once these files have been created, they may be packaged along with theFeatureTransformer class and pickled object into a .zip or .tar.gzarchive. This archive should contain:

FeatureTransformer.py

gbmFit.pkl

The attachment may be called an arbitrary name here named gbm.tar.gz.

Adding the Model and Stream Descriptors.

Now that the model, stream descriptors, schemas, and attachment havebeen created, they may be added to FastScore. This may be done throughthe command line (288), or using Dashboard (290). From the command line(288) adding the schemas and stream descriptors may be accomplishedwith:

fastscore schema add gbm_input gbm_input.avsc fastscore schema addgbm_output gbm_output.avsc fastscore stream add GBM-in gbm-in.jsonfastscore stream add GBM-out gbm-out.jsonand adding the model and attachment may be accomplished with:fastscore model add GBM score_auto_gbm.pyfastscore attachment upload GBM gbm.tar.gz

After adding the model, attachment, and streams to FastScore, they maybe viewed from the FastScore Dashboard (290).

Delivering Scores Using Kafka.

The final step is to run the model, and deliver input records and outputscores with Kafka. Kafka producers and consumers may be implemented inmany languages. In the example code attached to this tutorial, a simpleScala Kafka client kafkaesq is used, which streams the contents of afile line-by-line over a specified input topic, and then prints anyresponses received on a specified output topic. In this example,FastScore is compatible with any implementation of Kafkaproducer/consumer.

After FastScore is configured, the system is ready to start scoring,which may be commenced from the CLI (288) with

fastscore job run GBM GBM-in GBM-out

Using an included Kafka client script, score a file may be accomplishedwith:

python kafkaesq --input-file/path/to/input/file.json input output

At this point, the job may be stopped with fastscore job stop.

FIG. 6 is a flow chart illustrating an embodiment of a process for ananalytic model execution. In one embodiment, the process of FIG. 6 iscarried out by the system of FIGS. 2A, 2B, 2C, 2D, 2E, and/or 2F.

In step 602, at an interface an analytic model (226) is received forprocessing data. In step 604, the analytic model is inspected todetermine a language, an action, an input type, and an output type. Forexample, the analytic model (226) may be inspected to conform to ananalytic model abstraction (224) before binding. In step 606, a VEE isgenerated for an analytic engine that includes executable code (228) toimplement the analytic model for processing an input data stream. Inoptional step 608, a model and/or stream sensor (272) is added. In oneembodiment, the sensor (272) is added to instrument the VEE for theanalytic engine (222), wherein the sensor (272) provides metrics formonitoring, testing, statistically analyzing, and/or debugging aperformance of the analytic model.

In one embodiment, the analytic model is implemented using a container(222) to provide the VEE for the analytic engine. In one embodiment, thecontainer (222) is a portable and independently executable microservice.

In one embodiment, the analytic model includes an input configurationschema (232) that specifies an input type, an output configurationschema (232) that specifies an output type, and an I/O descriptorabstraction that specifies a stream type.

In one embodiment, the VEE for the analytic engine (222) includes aplurality of runtime engines (254, 256, 258, 260) that each support adistinct analytic model programming language.

In one embodiment, the interface for receiving the analytic model forprocessing data includes an API and/or SDK (286), a CLI (288), and/or adashboard interface (290).

FIG. 7 is a flow chart illustrating an embodiment of a process forinspecting an analytic model (226). In one embodiment, the flow of FIG.7 is included in step 604 of FIG. 6.

In step 702, a language is determined from the analytic model, and maybe one of the following: C, Python, Java, R, S, SAS, PFA, H2O, PMML,SPSS, Mathematica, Maple, and MATLAB. In one embodiment, the language isdetermined based on the file extension. In step 704, the analytic modelis interpreted to determine code points for the beginning and end ofmodel execution framework and/or for input and output of data, forexample looking for a main( )loop, for begin and end code snippets,and/or for action, emit and/or yield keywords, depending on thedetermined language in step 702.

In step 706, the analytic model is interpreted to determine whetherstate is to be saved externally, for example using snapshots asdescribed above. In step 708, the analytic model is interpreted todetermine whether state is to be shared, for example using cells and/orpools as described above. In step 710, the analytic model is interpretedto determine state initialization and state management. In step 712, theanalytic model is interpreted to determine concurrency controls, forexample for scaling and/or parallelization. In step 714, the analyticmodel is interpreted to determine safety controls, reliability controls,and/or checkpointing. In step 716, the analytic model is interpreted todetermine post-execution clean-up.

FIG. 8 is a flow chart illustrating an embodiment of a process forgenerating a virtualized execution environment (222) for an analyticmodel. In one embodiment, the flow of FIG. 8 is included in step 606 ofFIG. 6.

In optional step 802, a first programming language, for example R,Python, Java, and/or C, of the analytic model (252) for processing datais translated to a first analytic model programming language, forexample PFA and/or PPFA, to generate the executable code to implementthe analytic model for processing the input data stream.

In step 804, the executable code (262, 264) is routed to implement theanalytic model for processing the input data stream to one of theplurality of runtime engines (254, 256, 258, 260) based on the firstanalytic model language.

In step 806, an analytic model abstraction (224) and/or an I/Odescriptor abstraction (232) go through a binding stage. For example,when an engine (222) binds an analytic model M1 (226, 252) with a I/Odescriptor S1 based at least in part on the schema shared by M1 and S1.The engine (222) then binds M1 with a model runner (228) associated withthe engine (222) and S1 with an input port (236). The engine (222) maycheck the input schema in M1 with that of S1 for conformity and flag anexception if they do not match.

In step 808, an input data stream is received at a stream processor(266). In step 810, the input data stream is processed using theexecutable code (262, 264) that implements the analytic model, whereinthe stream processor (266) enforces the input type, for example that ofI/O descriptor S1 in the example above. In step 812, an output datastream is generated using a stream processor (266) based on the outputtype, wherein the output data stream includes a score and/or a metric.

FIG. 9 is a flow chart illustrating an embodiment of a process foradding a model sensor and/or a stream sensor. In one embodiment, theflow of FIG. 9 is included in step 608 of FIG. 6.

As described above, sensors (272) are configurable functions that may bebound to either the model and/or a data stream and may be expressed forexample in JSON. In step 902, it is determined what the sensor filteris, that is whether the sensor senses information related to datatypefor example data bounds and/or ranges, and/or the sensor sensesinformation related to statistical measures for example mean, median,standard deviation, variance, pdf, and/or cdf.

In step 904 a sensor sampling frequency is determined, for examplesampling two times a second in the example above. In step 906, a sensorreporting frequency is determined, for example reporting every threeseconds in the example above.

FIG. 10 is a flow chart illustrating an embodiment of a process for adynamically configurable microservice model for data analysis. In oneembodiment, the process of FIG. 10 is carried out by the system of FIGS.2A, 2B, 2C, 2D, 2E, and/or 2F.

In step 1002, a VEE is generated for an analytic engine that includesexecutable code to implement an analytic model for processing an inputdata stream. In one embodiment, the analytic model is implemented usinga container to provide the VEE for the analytic engine, and thecontainer is a portable and independently executable microservice, forexample a web service with a RESTful API.

In step 1004, a configuration for the analytic model is received at aninterface. In one embodiment, the interface for receiving includes anAPI and/or SDK (286), a CLI (288), and/or a dashboard interface (290).In one embodiment, the analytic model includes an input configurationschema that specifies an input type, an output configuration schema thatspecifies an output type, and a I/O descriptor abstraction thatspecifies a stream type.

In step 1006, the VEE for the analytic engine is dynamically configuredat runtime based on the configuration for the analytic model. In oneembodiment, the VEE for the analytic engine is dynamically configurablebased upon receiving the configuration for the analytic model thatincludes an addition/modification/removal of the analytic model, theinput configuration schema, the output configuration schema, and/or thestream configuration descriptor.

FIG. 11 is a flow chart illustrating an embodiment of a process fordynamic sensors (272). In one embodiment, the process of FIG. 11 may becarried out by model (252) with sensors (272).

In optional step 1102, a sensor (272) is added dynamically, for exampleat run-time and/or debug-time. As described above, the added sensor mayinstrument the VEE for the analytic engine, wherein the sensor providesmetrics for monitoring, testing, and/or debugging a performance of theanalytic model.

In optional step 1104, the dynamic sensor (272) from step 1102 is alsodynamically reconfigurable, for example at run-time and/or debug-time.An example of reconfiguring a dynamic sensor comprises changing a sensorparameter and/or changing a sensor threshold.

In optional step 1106, the dynamic sensor (272) from step 1102 isdynamically removable, for example at run-time and/or debug-time.Sensors may decrease performance and/or no longer be required fordebugging once a bug is eradicated, and so removing the sensor mayimprove performance and/or efficiency.

FIG. 12 is a flow chart illustrating an embodiment of a process fordynamic configuration of a VEE (222). In one embodiment, the process ofFIG. 12 may be included in step 1006 of FIG. 10.

In step 1202, an update to a configuration of the VEE is received. Anexample of an update to configuration includes: changing an includedlibrary, a modification to a cloud execution environment, and/or otherchanges to the (222) engine and/or model environment.

In step 1204, the VEE for the analytic engine may be dynamicallyreconfigured based on the update to the configuration of the VEE, forexample at run-time and/or debug-time.

FIG. 13 is a flow chart illustrating an embodiment of a process fordeployment and management of model execution engines. In one embodiment,the process of FIG. 13 is carried out by the system of FIGS. 2A, 2B, 2C,2D, 2E, and/or 2F.

In step 1302, a first analytic model for processing data and a secondanalytic model for processing data are received at an interface. In oneembodiment, the interface includes an API and/or SDK (286), a CLI (288),and/or a dashboard interface (290).

In step 1304, a first VEE is generated for a first analytic engine thatincludes executable code to implement the first analytic model forprocessing a first input data stream. In step 1306, a second VEE isgenerated for a second analytic engine that includes executable code toimplement the second analytic model for processing a second input datastream.

In one embodiment, the first analytic model is implemented using a firstcontainer (202, 222) to provide the first virtualized executionenvironment for the first analytic engine, wherein the second analyticmodel is implemented using a second container (216, 222) to provide thesecond virtualized execution environment for the second analytic engine,and wherein the first container and the second container are eachdynamically scalable. In one embodiment, the first and second analyticmodels are stored using a Model Manager (210). In one embodiment, thefirst container (202, 222) and the second container (216, 222) are eacha portable and independently executable microservice.

In step 1308, the first VEE for the first analytic engine and the secondVEE for the second analytic engine are deployed. For example, they maybe deployed using a fleet controller (212). In one embodiment, they maybe deployed in a pipeline as shown in FIG. 2A, and/or FIG. 3 as anexample.

In one embodiment, deployment of the first VEE for the first analyticengine and the second VEE for the second analytic engine in a pipelinecomprises cloud complex analytic workflows that may be cloud portable,multi-cloud, hybrid cloud, system portable, and/or language neutral.

In optional step 1310, state information is shared between the first VEEand second VEE. In one embodiment, state information is shared betweenthe first VEE for the first analytic engine and the second VEE for thesecond analytic engine, wherein the first virtualized executionenvironment for the first analytic engine and the second virtualizedexecution environment for the second analytic engine are executedconcurrently or sequentially.

In optional step 1312, dynamic scaling of a VEE is performed based onsensor measurement. For example, the first VEE for the first analyticengine (202, 222) may implement a concurrency model, wherein the firstVEE for the first analytic engine (202, 222) includes a sensor forinstrumenting the first VEE for the first analytic engine. Then, dynamicscaling of the first VEE is performed for the first analytic engine(202, 222) based on the concurrency model and a measurement detectedusing the sensor, for example to deploy additional containers for thefirst analytic engine to be executed in parallel, as described above foran R model that may be made more efficient through parallelization.

FIG. 14 is a flow chart illustrating an embodiment of a process forredeployment of model execution engines. In one embodiment, the processof FIG. 14 is carried out by the system of FIGS. 2A, 2B, 2C, 2D, 2E,and/or 2F.

In step 1402, an update to a configuration of the first VEE for thefirst analytic engine (202, 222) is received. An example of an updatewould be a change to a cloud execution environment.

In step 1404, the first VEE for the first analytic engine (202, 222) isdynamically redeployed to a different computing execution environmentat, for example, run-time and/or debug-time. For example if the updateis a change to the cloud execution environment, then the engine (202)may be redeployed from off-premises to a cloud, an enterprise datacenter, and/or a hybrid cloud.

Further examples of redeployment would be dynamically redeploying engine(202) from Azure to AWS or Google Cloud, moving a subset of engineswithin a pipeline from Azure to AWS or Google. This redeployment may bebased on which environment has available GPU support as based on sensorswhile testing/executing in Azure, as described in FIG. 3.

Although the foregoing embodiments have been described in some detailfor purposes of clarity of understanding, the invention is not limitedto the details provided. There are many alternative ways of implementingthe invention. The disclosed embodiments are illustrative and notrestrictive.

What is claimed is:
 1. A system, comprising: a processor configured to:receive at an interface an analytic model for processing data; inspectthe analytic model to determine a language, an action, an input type,and an output type; and generate a virtualized execution environment foran analytic engine that includes executable code to implement theanalytic model for processing an input data stream; and a memory coupledto the processor and configured to provide the processor withinstructions.
 2. The system recited in claim 1, wherein the analyticmodel is implemented using a container to provide the virtualizedexecution environment for the analytic engine.
 3. The system recited inclaim 1, wherein the analytic model is implemented using a container toprovide the virtualized execution environment for the analytic engine,and wherein the container is a portable and independently executablemicroservice.
 4. The system recited in claim 1, wherein the analyticmodel includes an input configuration schema that specifies an inputtype, an output configuration schema that specifies an output type, andan I/O descriptor abstraction that specifies a stream type.
 5. Thesystem recited in claim 1, wherein the virtualized execution environmentfor the analytic engine includes a plurality of runtime engines thateach supports a distinct analytic model programming language.
 6. Thesystem recited in claim 1, wherein the analytic model includes an inputconfiguration schema that specifies an input type, an outputconfiguration schema that specifies an output type, and a streamconfiguration descriptor that specifies a stream type, and wherein theprocessor is further configured to: bind the analytic model and thestream descriptor; receive the input data stream at a stream processor;process the input data stream using the executable code that implementsthe analytic model, wherein the stream processor enforces the inputtype; and generate an output data stream using the stream processorbased on the output type, wherein the output data stream includes ascore and/or a metric.
 7. The system recited in claim 1, wherein theinterface for receiving the analytic model for processing data includesan Application Programming Interface (API), a Command Line Interface(CLI), and/or a dashboard interface.
 8. The system recited in claim 1,wherein the analytic model for processing data is coded in a firstprogramming language, and wherein the virtualized execution environmentincludes a plurality of runtime engines that each supports a distinctanalytic model programming language, and wherein the processor isfurther configured to: translate the first programming language of theanalytic model for processing data to a first analytic model programminglanguage to generate the executable code to implement the analytic modelfor processing the input data stream; and route the executable code toimplement the analytic model for processing the input data stream to oneof the plurality of runtime engines based on the first analytic modellanguage.
 9. The system recited in claim 1, wherein the processor isfurther configured to: add a sensor to instrument the virtualizedexecution environment for the analytic engine, wherein the sensorprovides metrics for monitoring, testing, statistically analyzing,and/or debugging a performance of the analytic model.
 10. The systemrecited in claim 1, wherein the processor is further configured to:inspect the analytic model to determine code points for one or more ofthe following: state initialization, concurrency controls, statemanagement, safety and reliability controls, beginning and end of modelexecution framework for input and output of data, and post-executionclean-up.
 11. A method, comprising: receiving at an interface ananalytic model for processing data; inspecting the analytic model todetermine a language, an action, an input type, and an output type; andgenerating a virtualized execution environment for an analytic enginethat includes executable code to implement the analytic model forprocessing an input data stream.
 12. The method of claim 11, wherein theanalytic model is implemented using a container to provide thevirtualized execution environment for the analytic engine.
 13. Themethod of claim 11, wherein the analytic model is implemented using acontainer to provide the virtualized execution environment for theanalytic engine, and wherein the container is a portable andindependently executable microservice.
 14. The method of claim 11,wherein the analytic model includes an input configuration schema thatspecifies an input type, an output configuration schema that specifiesan output type, and a stream configuration descriptor that specifies astream type.
 15. The method of claim 11, wherein the virtualizedexecution environment for the analytic engine includes a plurality ofruntime engines that each supports a distinct analytic model language.16. A computer program product, the computer program product beingembodied in a tangible computer readable storage medium and comprisingcomputer instructions for: receiving at an interface an analytic modelfor processing data; inspecting the analytic model to determine alanguage, an action, an input type, and an output type; and generating avirtualized execution environment for an analytic engine that includesexecutable code to implement the analytic model for processing an inputdata stream.
 17. The computer program product recited in claim 16,wherein the analytic model is implemented using a container to providethe virtualized execution environment for the analytic engine.
 18. Thecomputer program product recited in claim 16, wherein the analytic modelis implemented using a container to provide the virtualized executionenvironment for the analytic engine, and wherein the container is aportable and independently executable microservice.
 19. The computerprogram product recited in claim 16, wherein the analytic model includesan input configuration schema that specifies an input type, an outputconfiguration schema that specifies an output type, and a streamconfiguration descriptor that specifies a stream type.
 20. The computerprogram product recited in claim 16, wherein the virtualized executionenvironment for the analytic engine includes a plurality of runtimeengines that each supports a distinct analytic model language.